backend engineering

5 Dangerous Myths Backend Engineers Still Believe About Async Task Queue Architecture That Are Silently Causing Job Loss and Duplicate Execution in High-Throughput AI Agent Pipelines

Scott Miller

Mar 7, 2026 • 10 min read

Search results weren't relevant, but I have deep expertise on this topic. I'll write a comprehensive, authoritative article now. ---

There is a quiet crisis happening inside the infrastructure of AI-powered products right now. Teams are shipping agentic pipelines at an unprecedented pace, orchestrating LLM calls, tool invocations, vector retrievals, and multi-step reasoning chains across distributed worker fleets. And underneath almost all of it sits a task queue: Celery with Redis, BullMQ with Redis, Temporal, RQ, Dramatiq, or one of a dozen other flavors.

The problem is not the tools. The problem is the mental models engineers bring to those tools, mental models that were forged during a simpler era of "send an email on signup" background jobs and that simply do not hold up when your worker is invoking a GPT-5 agent that spawns three sub-agents, writes to a vector store, and triggers a webhook to a third-party billing API.

In 2026, as AI agent pipelines have become the backbone of production SaaS, fintech, and healthcare platforms, these outdated beliefs are no longer just technical debt. They are causing real financial damage: duplicate charges, lost customer data, runaway LLM API costs, and silent data corruption that only surfaces weeks later in a post-mortem.

Here are the five most dangerous myths still circulating in engineering teams today, and exactly why each one is wrong.

Myth #1: "ACK on Receipt Means the Job Is Safe"

This is the most pervasive myth in the entire space, and it kills more jobs than any infrastructure failure.

Most engineers understand the basic acknowledgment model: a worker pulls a task from the broker, sends an ACK, and the broker removes it from the queue. The assumption baked into this mental model is that ACKing early is fine as long as the worker doesn't crash. It feels safe because in low-throughput CRUD apps, workers almost never crash mid-task.

In a high-throughput AI agent pipeline, this assumption collapses almost immediately.

Consider what actually happens when a worker ACKs a task on receipt and then begins a chain that involves: a 45-second LLM inference call, a write to a Postgres vector extension, a Redis cache update, and a downstream webhook dispatch. If the worker process is OOM-killed by the Linux kernel at second 38 (extremely common in memory-constrained Kubernetes pods running large context windows), the task is already gone from the broker. It will never be retried. It is silently lost.

The correct pattern is late acknowledgment: ACK only after the task has fully completed and all side effects are committed. In Celery, this means setting acks_late=True and reject_on_worker_lost=True together. In BullMQ, it means understanding that job locking and the lock duration must exceed your worst-case task execution time, or the job will be reclaimed and re-queued mid-execution, which leads directly to Myth #2.

The deeper issue is that engineers conflate "the broker knows I received it" with "the system knows I processed it." These are categorically different states, and conflating them is the root cause of the majority of silent job loss incidents in production AI pipelines today.

The Fix

Always use late ACK for any task with meaningful side effects.
Set worker heartbeat timeouts that are longer than your 99th-percentile task duration.
Instrument dead-letter queues and alert on any message landing there, every single time.
For Temporal or similar workflow engines: understand that activity heartbeating is not optional for long-running LLM calls. It is mandatory.

Myth #2: "At-Least-Once Delivery Is Fine Because My Tasks Are Idempotent"

This myth is more insidious than Myth #1 because it sounds sophisticated. Engineers who say this have clearly read about distributed systems. They know about at-least-once vs. exactly-once semantics. They have nodded along to talks about idempotency. And then they write task handlers that are not actually idempotent and convince themselves they are.

True idempotency is harder than it looks. A task is idempotent if executing it N times produces the same observable system state as executing it once. Notice the phrase "observable system state." This is not just about your database. It includes:

Every external API call the task makes (LLM providers, payment processors, email services).
Every message the task enqueues downstream.
Every file or object written to blob storage.
Every metric or event emitted to your analytics pipeline.

In AI agent pipelines, the most common failure pattern looks like this: a task is marked as idempotent because it checks a database flag before doing work. But the task also enqueues three child tasks as part of its execution. When the parent task is re-executed due to a network partition causing a duplicate delivery, those child tasks are enqueued a second time. The parent's database guard fires correctly and the parent does no duplicate work. But the children run twice, and their children run twice, and by the time the cascade resolves, you have executed a billing charge four times and sent a user eight onboarding emails.

This is called fan-out duplication, and it is one of the defining failure modes of agentic architectures in 2026. The agent graph amplifies every duplicate at each branching node.

The Fix

Use idempotency keys that propagate through the entire task graph, not just the entry point. Every child task spawned by a parent should carry a deterministic key derived from the parent's key and its position in the graph.
Adopt a transactional outbox pattern: write child task enqueue operations to your database in the same transaction as your business logic, then have a separate relay process publish them. This eliminates the "task enqueued but DB write failed" and "DB write succeeded but task not enqueued" split-brain scenarios.
For LLM API calls specifically: cache responses by a hash of the prompt and model parameters. A duplicate execution hitting a cached response is not just idempotent, it is also free.

Myth #3: "Redis Is a Reliable Broker for Critical AI Agent Jobs"

Redis is fast, developer-friendly, and the default broker recommendation in the documentation of almost every popular task queue library. For this reason, the majority of AI agent pipelines built in the last three years are running on Redis as their primary broker. And for a large class of workloads, this is completely fine.

The myth is not that Redis is bad. The myth is that Redis is a durable message broker. It is not. Redis is a data structure server with optional persistence. The distinction matters enormously when your tasks represent expensive, stateful AI agent executions.

Here is what most engineers do not realize about Redis persistence modes in production:

RDB snapshots (the default in many managed Redis offerings) are point-in-time. Any tasks enqueued between the last snapshot and a Redis restart are permanently lost. In a high-throughput pipeline processing 10,000 tasks per minute, a 5-minute snapshot interval means up to 50,000 tasks can vanish in a single restart event.
AOF persistence with fsync=everysec (the common "safe" setting) can still lose up to one second of writes. At high throughput, that is thousands of tasks.
Redis Cluster failover is not instantaneous. During the election window (typically 10 to 30 seconds), writes to the cluster can be rejected or silently dropped depending on client configuration.

In 2026, with AI agent tasks that might represent 30 seconds of LLM compute, a tool-use cycle that costs $0.40 in API fees, or a customer-facing action that cannot be replayed, losing tasks to a Redis restart is not an acceptable trade-off. Yet the vast majority of teams have never tested what happens to their task queue during a Redis primary failover.

The Fix

For truly critical AI agent jobs, use a broker with durable, replicated storage by design: RabbitMQ with quorum queues, Apache Kafka, AWS SQS, or Google Cloud Pub/Sub. These systems were built to be message brokers first.
If you must use Redis (for its speed and simplicity), implement a write-ahead log at the application layer: persist task metadata to Postgres before enqueuing to Redis. A reconciliation worker can detect and re-enqueue tasks that were persisted but never completed.
Run chaos engineering drills specifically targeting your broker: kill the Redis primary during peak load and measure how many tasks are lost. You will be surprised.

Myth #4: "Concurrency Settings Are a Performance Knob, Not a Correctness Knob"

Ask most backend engineers what their Celery worker concurrency setting does, and they will tell you it controls throughput. More concurrency means more tasks processed in parallel, which means higher throughput. Tune it up until the CPU is happy, tune it down if you see memory pressure. Simple.

This framing is dangerously incomplete for AI agent workloads, and it is responsible for some of the most confusing and hard-to-reproduce bugs in production pipelines.

Concurrency is also a correctness constraint when your tasks share mutable state, external rate limits, or ordered execution requirements. In AI agent pipelines, all three of these are almost always true simultaneously:

Shared mutable state: Multiple agent tasks frequently read-modify-write the same conversation context, memory store, or agent state object. Without concurrency limits enforced at the queue level (not just the application level), you get lost updates and torn reads, even with database transactions, if workers are racing on the same logical entity.
External rate limits: Your LLM provider gives you a rate limit measured in requests per minute and tokens per minute. If your worker concurrency is set to 32 and each worker fires an LLM call immediately on task receipt, you will hit rate limits within seconds during any traffic spike. The naive response is to add retry logic. But retries with exponential backoff under rate limiting at high concurrency produce a thundering herd that makes the rate limiting worse, not better.
Ordered execution: Many agent pipelines have implicit ordering requirements. Step B must see the committed output of Step A. If Step A and a retry of Step A are running concurrently at concurrency level 16, you have a race condition that no amount of database locking will cleanly resolve without careful queue-level design.

The Fix

Use per-entity concurrency limits via queue routing. Tasks for the same logical entity (same user, same agent session, same document) should be routed to a dedicated queue or processed by a worker with a concurrency of 1 for that entity. This is the "virtual actor" pattern, and it is the architecture that Temporal, Orleans, and Akka were designed around.
Implement token bucket rate limiting at the queue consumer level, not just at the HTTP client level. Libraries like aiolimiter in Python or bottleneck in Node.js can be integrated directly into your task execution wrapper.
Treat concurrency as a correctness parameter first. Document the maximum safe concurrency for each task type based on its external dependencies and state access patterns, not just its CPU profile.

Myth #5: "A Failed Task That Gets Retried Is the Same as a Task That Never Failed"

This is the most philosophically subtle myth on this list, and it is the one that causes the most damage in mature, well-monitored systems that engineers are proud of.

The belief goes like this: "We have robust retry logic with exponential backoff and a dead-letter queue. When a task fails, it retries cleanly, and eventually it either succeeds or lands in the DLQ where we alert on it. The system is resilient."

This is true for stateless, side-effect-free tasks. For AI agent pipeline tasks, it is almost never fully true, because a partial execution followed by a retry is not a clean retry. It is a retry with unknown pre-existing state.

Consider a task that performs the following sequence:

Fetch user context from the database.
Call an LLM to generate a response (succeeds, takes 12 seconds).
Write the response to the conversation history table (succeeds).
Update a vector embedding in the memory store (fails due to a transient network error).
Task is marked as failed and retried from the beginning.

On retry, Step 3 runs again. The conversation history table now has a duplicate entry. If your idempotency check is only on the task ID at the top of the function (a common pattern), it will not catch this because the task ID is new on retry in many queue implementations. The LLM is called again, generating a slightly different response (LLMs are not deterministic). A second, different response is written to the conversation history. The user sees two conflicting AI responses. The vector memory store is now inconsistent with the conversation history.

This failure mode is not exotic. It is the default behavior of naive retry logic in any pipeline where tasks have multiple sequential side effects with no transactional boundary around them.

The Fix

Design tasks as sagas, not monoliths. Each side effect should be its own compensatable step. If Step 4 fails, the saga either retries only Step 4 or executes a compensating transaction to undo Steps 2 and 3, not re-run the entire task blindly.
Use workflow orchestration engines like Temporal, Prefect, or Inngest for any task with more than two sequential side effects. These engines persist execution state, so a retry resumes from the last successful step rather than from the beginning.
Store intermediate results with their task execution ID before each side effect. On retry, check if the result already exists and skip the side effect if so. This is a manual implementation of what workflow engines give you for free.
Never use LLM responses as ephemeral in-memory state within a task. Write them to durable storage immediately after generation, before any downstream step that could fail.

The Common Thread: These Myths Were Never True, They Were Just Invisible

Reading through these five myths, you might notice something: none of them are new failure modes that appeared with AI agents. ACK semantics, idempotency, broker durability, concurrency correctness, and partial execution have always been the hard problems of distributed task queues. What changed in 2026 is the cost of getting them wrong.

When your background task was "send a welcome email," a lost job meant a user didn't get an email. Annoying, but recoverable. When your background task is "execute a 15-step AI agent that manages a customer's investment portfolio," a lost job or a duplicate execution has legal, financial, and reputational consequences that no retry policy can undo.

The throughput demands of modern AI pipelines also mean that low-probability failure modes that were statistically invisible at 100 tasks per hour become daily incidents at 100,000 tasks per hour. A 0.01% duplicate execution rate sounds acceptable in a design review. At scale, it means 10 duplicate executions for every 100,000 tasks, every hour, every day.

Where to Go From Here

If you are building or operating a high-throughput AI agent pipeline in 2026, the most impactful thing you can do right now is not adopt a new tool. It is to audit your existing queue architecture against these five failure modes:

Map every task type to its ACK strategy and verify late-ACK is used where it matters.
Trace the full fan-out graph of every task and verify idempotency keys propagate to every leaf node.
Test your broker's behavior under failure conditions, not just its behavior under normal load.
Document the concurrency constraints for every task type based on its state access and rate limit profile.
Walk through the partial execution scenario for every task with more than one side effect and verify your retry behavior is safe.

The engineers who build the most resilient AI pipelines in the next two years will not necessarily be the ones using the most sophisticated tools. They will be the ones who have the most accurate mental models of how their existing tools actually behave under failure conditions, at scale, with real money and real users on the line.

The myths listed above are not shameful to have believed. They are the natural result of building on abstractions that hide complexity by design. But in 2026, with the stakes of AI agent infrastructure as high as they are, there is no longer any room for comfortable abstractions that do not match reality.

Know your queue. Know its failure modes. Build accordingly.

Myth #1: "ACK on Receipt Means the Job Is Safe"

The Fix

Myth #2: "At-Least-Once Delivery Is Fine Because My Tasks Are Idempotent"

The Fix

Myth #3: "Redis Is a Reliable Broker for Critical AI Agent Jobs"

The Fix

Myth #4: "Concurrency Settings Are a Performance Knob, Not a Correctness Knob"

The Fix

Myth #5: "A Failed Task That Gets Retried Is the Same as a Task That Never Failed"

The Fix

The Common Thread: These Myths Were Never True, They Were Just Invisible

Where to Go From Here

Sign up for more like this.