multi-agent AI

How One Enterprise Platform Team Rebuilt Their Multi-Agent Tool Call Deduplication Architecture After Discovering That Foundation Model Retry Storms Were Triggering Duplicate Billing Events Across Per-Tenant Ledgers

Scott Miller

Apr 9, 2026 • 8 min read

When the platform engineering team at a mid-sized B2B SaaS company called Veridian Systems first rolled out their multi-agent AI platform in late 2025, they were proud of how fast they had moved. Within six weeks, they had onboarded 40 enterprise tenants onto a system that used coordinated AI agents to automate procurement workflows: one agent for vendor lookup, one for price negotiation drafts, one for compliance checking, and a billing agent that logged every completed action to per-tenant ledgers. It was elegant, modular, and fast.

By February 2026, they had a financial disaster quietly brewing in the background. Some tenants were being billed two, three, or even four times for the same workflow action. The root cause was not a rogue billing service or a database race condition. It was something far more subtle and, in hindsight, far more instructive: foundation model retry storms were generating cascading duplicate tool calls that the billing agent was faithfully and correctly logging as distinct events.

This is the story of how Veridian's platform team diagnosed the problem, rebuilt their deduplication architecture from the ground up, and emerged with a system that is now considered a reference implementation for agentic billing integrity in 2026.

The Problem: When Retry Logic Becomes a Billing Bomb

To understand what went wrong, you need to understand how modern multi-agent systems handle tool calls. In an agentic framework, a foundation model (in Veridian's case, a mixture of GPT-4o-class and Claude 3.x-class models depending on the task) does not just generate text. It emits structured tool call payloads: JSON-formatted instructions that tell downstream services to execute a specific action, such as querying a vendor database, sending a draft email, or logging a billable event.

The problem is that foundation models are stateless between inference calls. If an agent orchestrator does not receive a response within a timeout window, it retries the inference request. This is standard, sensible behavior. But here is where the architecture fell apart for Veridian:

The orchestrator would time out waiting for a tool call response and re-invoke the model.
The model, having no memory of its prior invocation, would emit a new tool call payload with a new tool call ID.
Both tool call payloads would eventually resolve and be processed by the downstream billing agent.
The billing agent, seeing two distinct tool call IDs, would log two distinct billable events to the tenant's ledger.

Under normal load, this was rare enough to go unnoticed. But in January 2026, Veridian onboarded three large enterprise tenants simultaneously. The increased load caused upstream API latency to spike, which increased timeout rates, which triggered more retries, which generated more duplicate tool calls. This is the classic retry storm pattern, and in a billing-sensitive context, it was catastrophic.

"We were looking at the ledger data and seeing line items that just did not make sense," said the platform team's lead engineer. "A single procurement workflow that should have generated five billable events was showing fifteen. We thought it was a bug in the ledger service. It took us four days to trace it all the way back to the model retry layer."

Diagnosing the Root Cause: A Three-Layer Failure

When the team did a full post-mortem, they identified three distinct layers where the architecture had failed to account for the stateless, non-deterministic nature of foundation models.

Layer 1: No Idempotency Keys at the Tool Call Level

The most fundamental gap was that tool calls were not assigned idempotency keys at the point of generation. In traditional distributed systems, any operation that could be retried is tagged with a unique, stable key so that the receiving service can detect and discard duplicates. Veridian's tool call payloads had IDs, but those IDs were generated fresh by the model on each invocation. There was no external, orchestrator-assigned idempotency key that persisted across retries of the same logical operation.

Layer 2: The Billing Agent Trusted Tool Call IDs Unconditionally

The billing agent had been designed to treat every inbound tool call ID as a unique, authoritative billing event. This was a reasonable assumption in a world where tool calls are generated exactly once. In a retry-prone environment, it was a fatal assumption. The agent had no deduplication window, no seen-ID cache, and no cross-check against the tenant's ledger for semantic duplicates.

Layer 3: No Retry Correlation at the Orchestrator Layer

The orchestrator managed retries at the HTTP transport level, but it did not propagate any "this is a retry of operation X" context into the model invocation payload. The model received each retry as a fresh, independent request. Even if the model had been capable of detecting that it was being asked to do something it had already done (it was not, because it was stateless), it had no signal to act on.

The Rebuild: A Four-Component Deduplication Architecture

Over six weeks in February and March of 2026, the Veridian platform team rebuilt their tool call pipeline around four interconnected components. The goal was to make the entire system idempotent by design, not by convention.

Component 1: The Logical Operation Token (LOT)

The team introduced a concept they called the Logical Operation Token. Before any model invocation that could result in a billable tool call, the orchestrator generates a deterministic, content-addressable token based on:

The tenant ID
The workflow session ID
The step name within the workflow (for example, "compliance-check" or "vendor-lookup")
A hash of the input parameters for that step

This token is injected into the model's system prompt context as a non-semantic metadata field. More importantly, it is passed as a header to every downstream tool call handler. The key insight is that this token is stable across retries. If the orchestrator retries a model invocation for the same logical step with the same inputs, the LOT is identical. This gives every downstream service a stable handle to detect duplicate work.

Component 2: The Tool Call Deduplication Gateway

The team introduced a new service that sits between the orchestrator and all downstream tool handlers: the Tool Call Deduplication Gateway (TCDG). Every tool call, regardless of which agent emits it, is routed through the TCDG before execution.

The TCDG maintains a per-tenant Redis cluster with a sliding deduplication window (configurable per tenant, defaulting to 10 minutes). When a tool call arrives, the TCDG checks whether a tool call with the same LOT has already been processed within the window. If it has, the TCDG returns the cached result of the original tool call without re-executing it. If it has not, it executes the tool call, caches the result against the LOT, and returns the result to the orchestrator.

This approach has a critical advantage: it is transparent to both the model and the downstream tool handlers. Neither needs to be modified. The deduplication logic lives entirely in the gateway layer.

Component 3: The Billing Agent Semantic Deduplication Layer

The team recognized that the TCDG alone was not sufficient. It handles retry storms within a single workflow session, but it does not protect against semantic duplicates: cases where two distinct tool calls, with different LOTs, represent the same real-world billable event. This can happen when, for example, a workflow is restarted after a crash and re-executes steps that had already completed before the crash.

To address this, the billing agent was given a semantic deduplication layer. Before writing any event to a tenant's ledger, the billing agent now queries the ledger for events that match on three dimensions:

Tenant ID
Workflow type and step name
A semantic fingerprint of the action's business-level outcome (for example, "vendor X was queried for product Y")

If a matching event exists within a configurable lookback window (default: 24 hours), the billing agent flags the new event as a potential duplicate and routes it to a human-in-the-loop review queue rather than writing it directly to the ledger. This adds a safety net below the TCDG for cases the gateway cannot catch.

Component 4: Retry Storm Detection and Circuit Breaking

The fourth component addressed the root cause of the storm itself, not just its downstream effects. The team instrumented their orchestrator with a per-tenant retry rate monitor. If the retry rate for a given tenant's workflows exceeds a configurable threshold within a rolling window, the orchestrator activates a circuit breaker: it pauses new workflow initiations for that tenant, alerts the on-call engineer, and begins shedding load by routing requests to a lower-priority queue.

This does not prevent retries from happening, but it prevents a latency spike from escalating into a full retry storm that overwhelms both the model API and the downstream billing pipeline.

The Results: Six Weeks After Deployment

By mid-March 2026, all four components were deployed to production. The results were immediate and measurable.

Duplicate billing events dropped by 99.7% within the first week of the TCDG going live.
The semantic deduplication layer caught an additional 23 edge-case duplicates in its first two weeks that the TCDG had not intercepted, all related to workflow restarts after infrastructure failures.
The circuit breaker triggered four times in its first month, each time preventing a nascent retry storm from reaching the billing pipeline.
Veridian issued retroactive billing credits to 12 of their 40 tenants for duplicate charges incurred before the fix. The total credit amount was not disclosed, but the team described it as "significant enough to justify the six-week rebuild effort many times over."

Perhaps more importantly, the rebuild gave Veridian's tenants a concrete, auditable paper trail for every billing event. Each ledger entry now references its LOT, the tool call ID, the TCDG cache status (hit or miss), and the semantic deduplication check result. Tenants can now audit their own billing data with confidence.

The Broader Lesson: Foundation Models Are Not Distributed Systems Citizens (Yet)

The Veridian case study exposes a gap that many enterprise teams building agentic platforms in 2026 are running into: foundation models were not designed with distributed systems contracts in mind. Concepts like idempotency, exactly-once delivery, and retry correlation are foundational to reliable distributed software. But when you introduce a stateless, non-deterministic model into your call graph, those contracts can break in ways that are invisible until they cause financial or operational damage.

The emerging best practices from cases like Veridian's point toward a clear architectural principle: treat every model invocation as an unreliable, potentially-duplicated event source, and build your downstream systems accordingly. This means:

Assigning idempotency keys externally, at the orchestrator layer, never trusting model-generated IDs as unique identifiers for logical operations.
Placing deduplication logic in a dedicated gateway layer rather than distributing it across individual tool handlers.
Giving billing and ledger systems a semantic understanding of what constitutes a duplicate, not just a syntactic one.
Instrumenting retry rates as a first-class operational metric, not an afterthought.

What Other Teams Can Take Away Right Now

If you are building or operating a multi-agent platform with per-tenant billing in 2026, here are the immediate action items from Veridian's experience:

Audit your tool call ID generation. Are your tool call IDs generated by the model, or by your orchestrator? If it is the former, you are vulnerable to the exact failure mode Veridian experienced.
Map your retry paths. Draw out every point in your agent pipeline where a retry can occur. For each one, ask: if this retry generates a duplicate tool call, what happens downstream?
Add a deduplication layer before your billing sink. Even a simple seen-ID cache with a short TTL will catch the majority of retry-driven duplicates.
Monitor retry rates per tenant. Retry rate spikes are an early warning signal for both infrastructure problems and emerging billing integrity issues.
Build for auditability. Every billing event should carry enough metadata for a tenant to independently verify that it represents a real, unique, completed action.

Conclusion

Veridian's experience is not unique. As multi-agent AI systems move from experimental deployments to production-grade enterprise infrastructure in 2026, the intersection of foundation model behavior and distributed systems reliability is becoming one of the most important and least-discussed engineering challenges in the industry. Retry storms, duplicate tool calls, and billing integrity failures are not exotic edge cases. They are predictable consequences of deploying stateless, non-deterministic models into systems that require exactly-once semantics.

The good news is that the fix is not exotic either. It is disciplined distributed systems engineering applied to a new class of event source. Idempotency keys, deduplication gateways, semantic duplicate detection, and circuit breakers are all well-understood tools. The insight Veridian earned the hard way is that you need to apply them before your tenants start seeing double on their invoices.

The teams that build this infrastructure correctly in 2026 will have a significant trust and reliability advantage as agentic AI platforms become table stakes for enterprise software. The teams that do not will be issuing a lot of billing credits.