AI Agents

The Silent Tax: How Meridian Analytics Rebuilt Its AI Agent Billing Pipeline After Tool-Call Retries Were Double-Charging Tenants

Scott Miller

Mar 27, 2026 • 9 min read

In January 2026, the engineering team at Meridian Analytics, a mid-size B2B SaaS company serving around 340 enterprise tenants, discovered something that kept their VP of Engineering awake for several nights in a row. A routine audit of billing reconciliation logs revealed that a non-trivial subset of tenants had been silently overcharged for AI agent usage, in some cases by as much as 23%, over a rolling 60-day period. The root cause was not a rogue billing script, a misconfigured price tier, or even a bad deployment. It was something far more subtle: tool-call retries during LLM provider outages were creating duplicate charge events that the reconciliation pipeline had no mechanism to detect or suppress.

This is the story of how Meridian found the bug, quantified the damage, and rebuilt their entire multi-tenant AI agent billing reconciliation pipeline from the ground up, without taking the product offline.

Who Is Meridian Analytics and What Were They Building?

Meridian Analytics offers an AI-powered business intelligence platform. Their flagship product, Meridian Copilot, allows enterprise tenants to deploy autonomous AI agents that can query internal data warehouses, generate financial reports, trigger third-party API calls, and synthesize insights across multiple data sources. Think of it as a persistent, multi-step reasoning agent that operates inside a tenant's data environment on a scheduled or on-demand basis.

Meridian's billing model is usage-based. Each tenant is charged per agent run, and each agent run is further broken down into billable units tied to:

LLM token consumption (input and output tokens, billed per 1,000 tokens)
Tool calls executed (each external API invocation, database query, or file operation counted as a discrete billable event)
Agent wall-clock runtime (billed per second for long-running agents)

By late 2025, Meridian had scaled to processing roughly 1.4 million agent runs per month across their tenant base. The billing pipeline had been built incrementally over 18 months, stitching together event streams from their agent orchestration layer, a message queue, and a downstream billing service backed by Stripe.

The Incident: What Happened in Early 2026

On January 9, 2026, one of the major LLM API providers Meridian used (they relied on two: one for reasoning-heavy tasks, one for faster, cheaper summarization) experienced a partial outage lasting approximately 47 minutes. During this window, API calls to the provider returned a mix of HTTP 500 errors, HTTP 503 responses, and in some cases, connections that simply timed out after 30 seconds without returning any response body.

Meridian's agent orchestration layer, built on top of a popular open-source agent framework, had a built-in retry policy: exponential backoff with up to 3 retries for any failed tool call or LLM inference call. This was considered standard practice and had been in place since the platform launched. The intent was sound: transient failures should not abort an entire agent run that might be halfway through a complex, multi-step task.

Here is where the silent double-charge was born.

The Anatomy of the Bug

When a tool call fired and the LLM provider returned a 503 or timed out, the orchestration layer treated the call as failed and queued a retry. However, in a significant portion of cases during the outage window, the provider had actually received and partially processed the request before the connection dropped. The provider's infrastructure, under stress, sometimes completed the inference and attempted to return the result, but the response never reached Meridian's agent layer.

From the orchestration layer's perspective, the call failed. From the provider's perspective (and from their usage metering system), the call succeeded and tokens were consumed. The retry then fired a second, identical request, which the provider processed successfully and returned a response for. Two token consumption events were recorded on the provider side for what the agent treated as a single tool call.

This alone would have been manageable if Meridian's billing pipeline had been designed to reconcile against provider-side usage data. It was not. Their pipeline was event-driven from the agent orchestration layer outward: when the orchestration layer emitted a "tool call completed" event, the billing service consumed it and created a charge record. The provider's actual metered usage was never cross-referenced.

The compounding problem was that Meridian's billing events were also not idempotent. Each event carried a UUID generated at the moment of event emission. A retry produced a brand new event with a brand new UUID, meaning the billing service had no way to know it was processing a logical duplicate of a previous attempt.

Discovery: The Audit That Changed Everything

The bug was not discovered during the January outage. It was discovered six weeks later, in late February 2026, by a junior data engineer named Priya Nambiar, who was building a cost attribution dashboard for internal use. Priya noticed that the ratio of provider-billed tokens to Meridian-billed tokens had a sharp, unexplained spike in the January 9 to January 12 window. She initially assumed it was a data pipeline lag artifact. When she dug deeper, she found that the spike corresponded almost perfectly with the outage window and its aftermath.

She escalated to the platform engineering team, who spent three days reconstructing the event sequence from logs. Their findings were sobering:

14,200 agent runs were active during or immediately after the outage window
Of those, 3,847 runs contained at least one tool call that was retried under the outage conditions
Of those retried runs, 2,109 runs had billing events that appeared to correspond to duplicate charges (matching tool type, tenant ID, approximate timestamp, and token range)
Total estimated overcharge across affected tenants: approximately $47,200 USD over the 60-day lookback period

The $47,200 figure was not catastrophic in isolation. But the implications were: if this had been happening silently for 60 days, it could have been happening for longer. And if LLM provider outages were becoming more frequent (a trend that Meridian's ops team confirmed was real, with three notable partial outages logged in Q4 2025 alone), the exposure would only grow.

The Rebuild: Designing a Resilient Billing Reconciliation Pipeline

Meridian's engineering leadership made the call to treat this as a Severity 1 architectural issue, not a one-time bug fix. They assembled a cross-functional team: two platform engineers, one data engineer, one billing systems specialist, and a product manager. They gave themselves eight weeks to ship a rebuilt pipeline to production.

Here is what they changed, and why.

1. Idempotency Keys at the Tool-Call Level

The most fundamental change was introducing stable, deterministic idempotency keys at the individual tool-call level, not just at the agent run level. Previously, each event emission generated a fresh UUID. The new system derives the idempotency key from a hash of:

The tenant ID
The agent run ID
The tool call's logical position in the execution graph (a combination of step index and tool name)
The input payload hash (a SHA-256 of the tool call arguments)

This means that if the same logical tool call is retried, it produces the same idempotency key. The billing service now checks for this key before creating a charge record. If a record with that key already exists (even if it was created by a prior attempt that appeared to fail from the orchestration side), the new event is acknowledged but not charged again.

Critically, Meridian also passes this idempotency key to the LLM provider's API where the provider supports it (OpenAI's API, for instance, supports idempotency keys on certain endpoints). This prevents duplicate processing on the provider side as well, closing both sides of the gap.

2. Provider-Side Usage Reconciliation as a First-Class Job

Meridian introduced a nightly provider reconciliation job that pulls usage data directly from each LLM provider's usage export API and compares it against Meridian's internal billing records for the same window. The job flags any tenant where the delta between provider-reported usage and Meridian-billed usage exceeds a configurable threshold (initially set at 3%).

Flagged discrepancies are routed to a reconciliation queue for human review before any billing adjustment is made. This is intentional: automated corrections to billing records carry their own risk, so the team wanted a human in the loop for any adjustment above $10 per tenant per day.

3. Outage-Aware Retry Logic

The orchestration layer's retry policy was redesigned to be outage-aware. When the system detects that a provider is in a degraded state (based on a circuit breaker that monitors error rates over a rolling 90-second window), it shifts retry behavior in two ways:

Retries are deferred, not immediate. Instead of firing exponential backoff retries in-process, the retry is written to a durable retry queue with a minimum delay of 5 minutes. This gives the provider time to stabilize and reduces the chance of hitting a partially-processed-but-unacknowledged state.
Retry events are explicitly tagged with a is_retry: true flag and the original attempt's idempotency key. The billing service uses this tag to enforce deduplication even in edge cases where the key derivation might differ slightly.

4. Tenant-Facing Billing Transparency Dashboard

One of the less technical but arguably most important changes was the addition of a tenant-facing usage and billing transparency dashboard. Tenants can now see, in near real-time, a breakdown of every agent run, every tool call, and the associated charge events. Each charge event shows its idempotency key (truncated for readability), the provider used, and whether any reconciliation adjustments were applied.

This was partly a trust-rebuilding measure after Meridian proactively disclosed the overcharge issue to affected tenants and issued credits. But it also serves a practical function: tenants themselves can now flag anomalies, creating a distributed early-warning system that Meridian's internal monitoring alone cannot replicate.

5. A Dedicated Billing Event Ledger

Perhaps the deepest architectural change was the introduction of a dedicated, append-only billing event ledger, separate from the operational event stream. Previously, billing events lived in the same Kafka topics as operational telemetry, and the billing service consumed them alongside dozens of other consumers. This created subtle ordering and at-least-once delivery issues.

The new ledger is built on a simple but durable foundation: a Postgres table with strict write-once semantics, a unique constraint on the idempotency key column, and a separate read-optimized replica for reporting. Every billing event, whether it results in a charge, a deduplication skip, or a reconciliation adjustment, is written to this ledger with a full audit trail. The ledger is the source of truth; Stripe and all downstream systems are driven from it, not the reverse.

The Results: Eight Weeks Later

By mid-April 2026, the rebuilt pipeline was fully live in production. The results after the first 30 days were measurable and meaningful:

Zero duplicate charge events detected in post-launch monitoring, across two minor LLM provider degradation events that occurred in April 2026
Provider reconciliation job flagged 11 discrepancies in the first month, all of which were sub-$5 and traced to legitimate rounding differences in token counting, not duplicates
Tenant support tickets related to billing disputes dropped by 68% in the first 30 days post-launch
The billing transparency dashboard was adopted by 61% of enterprise tenants within two weeks of launch, with no marketing push

Meridian also issued credits totaling $51,400 to affected tenants (the original $47,200 plus goodwill rounding), and retained all but two of the affected accounts. The two who churned had already been at-risk for unrelated reasons.

The Broader Lesson: Billing Is Infrastructure, Not an Afterthought

What makes Meridian's story instructive is not the specific bug. Retry-induced duplicate events are a well-understood problem in distributed systems. The deeper lesson is about how AI agent platforms introduce a new class of billing complexity that traditional SaaS billing patterns were not designed to handle.

In a conventional SaaS product, a user clicks a button, a feature executes, and you log a usage event. The causal chain is short and visible. In an AI agent platform, a single user-initiated agent run can spawn dozens of sub-calls, fan out across multiple external providers, retry autonomously on failure, and complete asynchronously minutes or hours later. The causal chain is long, partially opaque, and distributed across systems you do not fully control.

This means that billing for AI agents requires the same rigor that financial systems apply to transaction processing: idempotency by default, reconciliation as a continuous process, and audit trails that are immutable and complete. The retry logic that makes your agents resilient is the exact same logic that will silently corrupt your billing data if the two systems are not designed to work together.

Key Takeaways for Engineering Teams Building AI Agent Platforms

Derive idempotency keys deterministically. Never generate billing event IDs at emission time. Derive them from the logical identity of the operation so that retries produce the same key.
Reconcile against the provider, not just your own events. Your orchestration layer's view of what succeeded is not authoritative. The provider's usage data is the ground truth for token consumption.
Make retry behavior billing-aware. Retries should be tagged, deferred during outages, and handled as a special case in your billing pipeline, not treated identically to first attempts.
Build an append-only billing ledger early. Retrofitting one is painful. Starting with one gives you the audit trail you will need when something inevitably goes wrong.
Transparency is a competitive advantage. Tenants who can see their usage in detail are tenants who trust you with more of their workload.

Conclusion

Meridian Analytics did not set out to build a case study in billing resilience. They set out to build a great AI agent product, and they made the billing architecture decisions that most teams make: pragmatic ones, built incrementally, optimized for speed. The January 2026 outage exposed the gap between that pragmatic architecture and the demands of a production AI agent platform operating at scale.

The good news is that the gap is closable. The patterns that fix it (idempotency, reconciliation, outage-aware retries, append-only ledgers) are not exotic. They are borrowed directly from the distributed systems and fintech playbooks that engineers have been refining for years. The work is in applying them with deliberate intent to the unique topology of AI agent billing, before the next outage finds the gap for you.

If your team is building a usage-based AI agent platform and you have not yet stress-tested your billing pipeline against a simulated LLM provider outage with retries enabled, that test is worth running this week. The results might surprise you.