zero-trust security

How to Build a Zero-Trust Identity Verification Layer for Human-in-the-Loop Approval Gates in Multi-Agent Workflows

Scott Miller

Apr 7, 2026 • 11 min read

In 2026, multi-agent AI systems are no longer a research curiosity. They are the backbone of enterprise automation: orchestrating deployments, approving financial transfers, modifying production databases, and triggering irreversible supply chain actions. Alongside this power comes a threat that most platform security models were never designed to handle. When a high-stakes action arrives at an approval gate, how do you prove that a real, authorized human said "yes" and not a rogue agent, a compromised orchestrator, or a prompt-injected sub-agent mimicking one?

This is the problem of human-in-the-loop (HITL) identity forgery, and it is one of the most underappreciated attack surfaces in modern agentic architecture. Traditional role-based access control (RBAC) and OAuth tokens were designed to authenticate services and users in isolation. They were not designed to distinguish a human approval from an agent-synthesized approval inside an automated pipeline where both actors share the same token infrastructure.

This guide is written for platform engineers who need to build a zero-trust identity verification layer specifically for HITL approval gates. By the end, you will have a concrete, step-by-step architecture that cryptographically binds every high-stakes approval to a verified human identity, creates an auditable chain of custody, and is resistant to agent impersonation, token replay, and prompt-injection bypass attacks.

Why Standard Auth Is Not Enough in Agentic Pipelines

Before diving into the build, it is worth being precise about why your existing identity infrastructure is insufficient for this problem.

In a typical multi-agent workflow, agents carry delegated credentials. An orchestrator agent might hold a scoped JWT or an OAuth 2.0 access token on behalf of the user who originally kicked off the workflow. Sub-agents inherit or exchange tokens through internal service meshes. From the perspective of your approval gate, an inbound approval request carrying a valid, non-expired token signed by your identity provider looks exactly the same whether it originated from a human clicking a button or an agent autonomously deciding to approve its own next step.

The core vulnerabilities this creates include:

Token relay attacks: An agent reuses a human-issued token to self-authorize a downstream action the human never explicitly approved.
Prompt injection bypass: A malicious payload in an upstream data source instructs an agent to fabricate or forward an approval signal.
Orchestrator compromise: A compromised orchestrator agent generates synthetic approval events that pass token validation but carry no human intent.
Ambient authority escalation: An agent operating within a broad permission scope approves an action that technically falls within its token's claims but was never intended to be agent-autonomous.

Zero-trust principles demand that we never assume a valid token implies human presence. We must verify it explicitly, at the moment of the high-stakes action, every single time.

Step 1: Define Your "High-Stakes Action" Taxonomy

Not every action in your workflow needs a HITL gate. Over-gating destroys the efficiency value of agentic automation. Your first job is to define a clear, versioned taxonomy of action risk levels.

A practical three-tier model works well for most enterprise platforms:

Tier 0 (Autonomous): Actions the agent can self-authorize within pre-approved policy bounds. Examples: reading data, generating reports, sending status notifications.
Tier 1 (Soft Gate): Actions that require a human to have been notified and given a time-limited veto window. Examples: scheduled deployments to staging, low-value API calls to external services.
Tier 2 (Hard Gate): Actions that require an active, real-time, cryptographically verified human approval before execution. Examples: production database writes, financial transfers above a threshold, infrastructure destruction, PII exports, and any action flagged as irreversible.

Encode this taxonomy in a machine-readable policy file (YAML or OPA Rego are both excellent choices) and version-control it alongside your workflow definitions. Every agent action in your orchestration layer should be tagged with its tier at definition time, not at runtime. Runtime tier assignment is itself an attack surface.

Step 2: Establish a Dedicated Approval Identity Provider (AIdP)

The single most important architectural decision in this guide is this: your HITL approval gate must use a dedicated identity provider that is logically and physically separate from the identity provider your agents use.

This is the zero-trust separation of planes. Call it your Approval Identity Provider (AIdP). Its sole purpose is to issue short-lived, single-use approval tokens that are cryptographically bound to a specific action, a specific human identity, and a specific moment in time. It must never issue tokens that agents can request programmatically.

Here is how to configure your AIdP:

2a. Use a Separate OIDC Tenant or Realm

If you are using Keycloak, create a dedicated realm (for example, hitl-approvals) that is firewalled from your agent service accounts. If you are using a cloud identity provider like Azure Entra ID or AWS IAM Identity Center, create a separate application registration or permission set exclusively for approval flows. Critically, service principals and agent identities must be explicitly blocked from authenticating to this tenant.

2b. Enforce MFA at the AIdP Level, Not the Application Level

Do not rely on your application layer to enforce multi-factor authentication. Require it as a non-negotiable policy at the AIdP. Every approval token must be issued only after a successful MFA challenge. In 2026, passkey-based authentication (FIDO2/WebAuthn) is the preferred standard here because it is phishing-resistant and cannot be relayed by an agent operating on a server with no physical authenticator present.

2c. Bind Tokens to Action Context

Extend your OIDC token claims to include a custom action_context claim. This claim must contain a cryptographic hash of the exact action being approved: the action type, the target resource identifier, the workflow run ID, and a server-generated nonce. This prevents a valid approval token from being replayed against a different action.

{
  "sub": "user:jane.doe@corp.com",
  "iss": "https://hitl-aidp.internal/realms/hitl-approvals",
  "aud": "approval-gate-service",
  "exp": 1775000000,
  "iat": 1774999700,
  "jti": "a1b2c3d4-e5f6-...",
  "action_context": {
    "workflow_run_id": "wf-9f3a2b",
    "action_id": "destroy-prod-cluster-us-east-1",
    "action_hash": "sha256:3f4a9b...",
    "nonce": "7c2d1e9a"
  },
  "amr": ["hwk", "pin"]
}

Note the amr (Authentication Methods References) claim. Require hwk (hardware key) to be present. If it is absent, reject the token at the gate regardless of other claims.

Step 3: Build the Approval Request Broker

The Approval Request Broker (ARB) is the service that sits between your agent orchestrator and the human approver. It is responsible for generating the approval challenge, routing it to the correct human, and issuing the action-bound nonce that the AIdP will embed in the token.

3a. The ARB Must Be Agent-Unreachable for Write Operations

Agents can submit an approval request to the ARB (a read-equivalent operation that places a request in queue). They must not be able to resolve, modify, or cancel an approval request. Enforce this with a strict API surface: agents have a write-only endpoint for request submission and a read-only endpoint for polling approval status. The resolution endpoint is exclusively accessible by authenticated humans through the AIdP flow.

3b. Generate a Server-Side Nonce and Action Hash

When the ARB receives an approval request from an agent, it must independently reconstruct and hash the action context from its own authoritative sources. Do not trust the action description provided by the agent verbatim. Pull the action definition from your versioned policy store, look up the target resource from your infrastructure registry, and generate the hash server-side. This prevents an agent from submitting a benign-looking approval request that, once approved, is silently swapped for a malicious action.

3c. Deliver the Approval Challenge Out-of-Band

The approval notification to the human must travel through a channel that is completely separate from the agent communication bus. If your agents communicate over an internal event bus or message queue, your approval notification must not. Recommended out-of-band channels include:

A direct push notification to a registered mobile device via a dedicated approval app (similar in concept to authenticator apps but purpose-built for workflow approvals)
A signed email with a one-time deep link to the approval UI
An enterprise messaging platform webhook (Slack, Teams) delivered to a channel that agents do not have write access to

The key property is that an agent compromising the primary communication channel cannot intercept, forge, or suppress the approval notification.

Step 4: Build the Approval UI with Anti-Spoofing Controls

The approval UI is where the human actually reads the action description and makes a decision. This UI is itself an attack surface. A compromised agent upstream might have manipulated the action description to be misleading, or a prompt injection might have inserted content designed to socially engineer the approver.

4a. Render Action Descriptions from Policy Definitions, Not Agent Output

The human-readable description of the action shown in the approval UI must be generated from your versioned policy store, not from the agent's submitted description. The agent submits an action ID and parameters. Your UI looks up the canonical description template for that action ID and renders it with the parameters substituted in. This eliminates the agent's ability to craft a misleading description.

4b. Show a Structured Risk Summary

Display a standardized risk summary card for every Tier 2 approval. This card should include: the action's reversibility classification, the blast radius (what resources are affected), the requesting workflow and its origin, the time the request was generated, and the approver's current session context (device, location, IP). This gives the human the information they need to make an informed decision and creates a visible anomaly if something is wrong.

4c. Require Explicit Confirmation of Key Parameters

For the highest-risk actions (irreversible infrastructure changes, large financial transfers), require the human to manually type a confirmation string derived from the action parameters. For example: "Type the target cluster name to confirm deletion." This is a deliberate friction mechanism borrowed from production deletion protection patterns. It forces the human to actively engage with the specifics of the action, not just click "Approve" on a notification.

Step 5: Implement the Cryptographic Verification Gate

This is the enforcement point. The Cryptographic Verification Gate (CVG) is the final checkpoint that the agent's action execution must pass through before the action is dispatched. It is a stateless, standalone microservice with one job: validate that a presented approval token is genuine, unexpired, single-use, and bound to the exact action being requested.

5a. Validation Checklist (Run in This Order)

Signature verification: Verify the JWT signature against the AIdP's published JWKS endpoint. Reject any token signed with an algorithm other than RS256 or ES256.
Issuer and audience check: The iss claim must match your AIdP's issuer URI exactly. The aud must match your CVG's registered audience identifier.
Expiry check: The exp claim must not be in the past. Approval tokens should have a short TTL (recommended: 5 minutes maximum).
JTI replay check: Look up the jti (JWT ID) in your distributed token blacklist (Redis with TTL matching the token's max lifetime works well). If the JTI has been seen before, reject and alert. Immediately add the JTI to the blacklist upon first successful use.
AMR check: Confirm the amr claim contains hwk. Reject software-only MFA approvals for Tier 2 actions.
Action hash binding check: Recompute the action hash from the current action request using the same algorithm as the ARB. Compare it against the action_context.action_hash claim. Any mismatch means the approved action does not match the requested action. Reject and raise a critical security alert.
Workflow run ID check: Confirm the action_context.workflow_run_id matches the currently executing workflow. This prevents cross-workflow token replay.

5b. Emit a Signed Audit Event on Every Decision

Whether the gate passes or rejects a request, it must emit a signed, tamper-evident audit event to your immutable audit log. Use a structured format like CloudEvents with a digital signature. The event must capture: the decision (approved or rejected), the rejection reason if applicable, the human identity from the token, the action hash, the workflow run ID, the CVG instance ID, and a monotonic sequence number. Store these events in an append-only log (AWS QLDB, Azure Immutable Blob Storage, or a self-hosted Merkle-tree log like Trillian are all appropriate).

Step 6: Harden Against Lateral Escalation and Insider Threats

A zero-trust HITL layer is not complete without controls that account for compromised or coerced human approvers.

6a. Enforce Separation of Duties for Critical Actions

For your most critical Tier 2 actions (for example, production data destruction or transfers above a defined financial threshold), require dual approval: two independent humans from different organizational units must each complete the full AIdP authentication and approval flow. The CVG should hold the action in a pending state until both approvals are received, each with a valid, independently issued token.

6b. Implement Behavioral Anomaly Detection on Approval Patterns

Feed your approval audit events into a behavioral analytics pipeline. Flag anomalies such as: an approver approving an unusually high volume of actions in a short time window, approvals occurring outside the approver's normal working hours or geographic location, approvals for action types the approver has never previously handled, and approvals immediately following a failed authentication attempt on the agent identity plane (a possible indication of a coordinated attack).

6c. Time-Lock High-Blast-Radius Actions

For actions with an exceptionally large blast radius (deleting an entire production environment, for instance), introduce a mandatory execution delay of 10 to 30 minutes after approval. This window gives your security team time to detect and cancel a fraudulently obtained approval before the action executes. The CVG should store the approval and only release the execution signal after the delay has elapsed, provided no cancellation signal has arrived from a security administrator.

Step 7: Test Your Gate Against the Threat Model

Building the gate is only half the job. You must continuously validate it against the specific threat scenarios it is designed to defeat. Incorporate the following test cases into your CI/CD pipeline and run them against a staging replica of your gate on every deployment:

Token relay test: Submit a valid agent-issued token (from your standard agent IdP) to the CVG. It must be rejected at the issuer check.
Replay attack test: Submit a valid, single-use approval token twice in rapid succession. The second submission must be rejected at the JTI replay check.
Action hash mismatch test: Submit a valid approval token for action A while requesting execution of action B. The CVG must reject at the action hash binding check.
Expired token test: Submit a token with a manipulated exp claim. It must be rejected at signature verification (since the payload was modified) or at the expiry check.
Software MFA bypass test: Submit a valid approval token where the amr claim contains only otp (software TOTP) and not hwk. It must be rejected at the AMR check for Tier 2 actions.
Cross-workflow replay test: Submit a valid approval token issued for workflow run wf-001 against a request from workflow run wf-002. It must be rejected at the workflow run ID check.

Putting It All Together: The Architecture at a Glance

Here is a summary of the full data flow for a Tier 2 action approval in this architecture:

An agent in your orchestration layer reaches a Tier 2 action gate. It submits an approval request to the ARB via the write-only endpoint.
The ARB independently computes the action hash, generates a nonce, stores the pending request, and dispatches an out-of-band approval notification to the designated human approver(s).
The human opens the approval UI (delivered out-of-band), authenticates to the AIdP using a hardware-backed passkey (satisfying the hwk AMR requirement), and completes any required confirmation steps.
The AIdP issues a short-lived, action-bound approval JWT containing the action_context claims and returns it to the approval UI.
The approval UI submits the token to the ARB's resolution endpoint, marking the request as approved.
The ARB notifies the waiting agent (via the read-only polling endpoint) that approval has been granted and provides the approval token.
The agent presents the approval token to the CVG alongside its action execution request.
The CVG runs the full 7-step validation checklist, emits a signed audit event, and either releases the action execution signal or rejects the request with a detailed rejection reason.
Post-execution, the action outcome is appended to the immutable audit log and linked to the approval event by workflow run ID.

Conclusion: Proving Human Intent Is Now a First-Class Security Requirement

As agentic AI systems take on greater autonomy and higher-stakes responsibilities in 2026, the question "did a human actually authorize this?" has moved from a compliance checkbox to a critical security invariant. The architecture described in this guide treats human intent as something that must be cryptographically proven, not merely assumed from the presence of a delegated token.

The key principles to carry forward are these: separate your human approval identity plane from your agent identity plane at the infrastructure level; bind every approval token to the exact action being authorized; validate that binding at the execution gate, not just at the UI layer; and build an immutable, signed audit trail that can prove the chain of human authorization to a regulator, an incident responder, or a court.

The agents in your pipeline will only become more capable and more autonomous. The integrity of your approval gates is what ensures that capability remains under deliberate human control. Build them accordingly.