CI/CD

How One Platform Team Discovered That Automated Dependency Updates Were Silently Corrupting Shared Agent Tool Manifests Across Tenant Boundaries

Scott Miller

Apr 11, 2026 • 9 min read

In early 2026, a mid-sized SaaS platform engineering team at a fictional but representative company we'll call Orbis Labs began noticing something unsettling. Tenant-facing AI agent tools were behaving inconsistently. Two customers running what appeared to be identical workflow configurations were getting different results. Support tickets trickled in slowly at first, then became a flood. The culprit, once discovered, was not a rogue deploy, a misconfigured environment variable, or a developer mistake. It was their own CI/CD pipeline doing exactly what it was told to do.

This is the story of how Orbis Labs uncovered a silent, systemic corruption of shared agent tool manifests triggered by automated dependency updates, and how they engineered a gating strategy robust enough to prevent it from ever happening again.

The Architecture: Why Shared Manifests Made Sense at First

Orbis Labs operated a multi-tenant platform that allowed enterprise customers to compose and deploy AI-powered agent workflows. Each tenant configured their agents using a tool manifest: a versioned JSON/YAML document that declared which tools an agent could invoke, the schemas those tools expected, and the runtime dependencies those tools relied on.

To reduce duplication and maintenance overhead, the platform team made a reasonable architectural decision: a library of shared base tool manifests would be maintained centrally in a monorepo. Tenant-specific manifests would inherit from these base definitions using a composition model, similar to how Docker images layer on top of base images. This meant a single source of truth for tool schemas, dependency declarations, and capability flags.

The shared manifests lived in a dedicated /manifests/base/ directory within the monorepo. Tenant-specific overrides lived in /manifests/tenants/{tenant-id}/. A manifest resolution service merged the two at agent initialization time, applying tenant overrides on top of the base.

On paper, this was elegant. In practice, it introduced a hidden blast radius that the team did not fully appreciate until it was too late.

The Automation That Caused the Problem: Renovate Running Unchecked

Like most modern platform teams, Orbis Labs used Renovate Bot to automate dependency updates across their monorepo. The configuration was fairly standard: Renovate scanned the repository nightly, identified outdated packages in package.json, pyproject.toml, and other dependency manifests, and automatically opened pull requests with version bumps.

Here is where the first architectural mistake lived. The Renovate configuration used a broad matchPaths glob pattern:

"matchPaths": ["**/*.json", "**/*.yaml", "**/*.toml"]

This pattern was intended to cover application dependency files. But the shared tool manifests, which also used JSON and YAML, fell within that glob scope. The tool manifests included fields like runtimeVersion, sdkDependencies, and toolchainRef that used version strings. Renovate interpreted these version strings as package version declarations and began updating them.

The critical detail: these were not package versions in the traditional sense. They were internal schema version identifiers, API contract references, and toolchain pinning values. Bumping them did not install a newer library; it changed the semantic contract the manifest declared to the resolution service. And because the changes looked like clean version bumps in a pull request, they passed code review without a second glance.

Why the Corruption Was Silent

Silent corruption is far more dangerous than loud failure. A broken build stops everything and demands attention. A silently corrupted manifest continues to function, just incorrectly, and the divergence between expected and actual behavior accumulates over time.

In this case, the corruption was silent for three compounding reasons:

Schema validation was version-aware but not semantics-aware. The manifest validator confirmed that the JSON was well-formed and that all required fields were present. It did not verify that the declared toolchainRef version was compatible with the tenant's pinned agent runtime. A bumped version string passed validation as long as it matched a semver pattern.
Tests covered happy paths, not cross-tenant divergence. The CI pipeline ran integration tests against a single reference tenant environment. If the shared base manifest changed in a way that broke tenant A but not the reference tenant, that breakage would not surface in CI. Tenant A would discover it in production.
Renovate PRs were auto-merged for low-risk updates. The team had configured Renovate to auto-merge patch-level version bumps after CI passed. Because the tests did not cover cross-tenant manifest resolution, CI passed, the PR merged, and the corruption propagated to the main branch silently.

The result was a slow drift. Over several weeks, the shared base manifests accumulated a series of Renovate-driven version bumps that collectively pushed the declared toolchain contracts out of alignment with what multiple tenants' agent runtimes actually expected.

The Moment of Discovery: A Tenant Diff That Should Not Have Existed

The breakthrough came not from monitoring or alerting, but from a support escalation. A tenant's engineering team noticed that their document-processing agent was truncating outputs above a certain token threshold. They opened a ticket. A platform engineer, while investigating, pulled the resolved manifest for that tenant and compared it manually to the manifest from three weeks prior.

The diff was startling. The sdkDependencies.agentCore field had been bumped from 3.4.1 to 3.7.0 in the base manifest. That particular version of agentCore introduced a breaking change in how the tool manifest's outputConstraints block was interpreted. The tenant's override did not touch outputConstraints, so the base manifest's value was used directly. The newer agentCore interpreted the existing constraint value differently, applying a stricter token ceiling than the tenant had configured.

Once the team understood the mechanism, they pulled resolved manifests for all active tenants and ran a diff analysis. Eleven of their thirty-four tenants had been affected to varying degrees. Some experienced subtle behavioral changes. Two experienced outright tool invocation failures that had been misattributed to network issues.

The Post-Mortem: Five Root Causes Identified

The Orbis Labs platform team conducted a thorough post-mortem. They identified five distinct root causes, each of which had to be addressed independently:

Overly broad Renovate glob patterns that included non-package dependency files in the update scope.
Absence of manifest-specific schema validation that understood the semantic meaning of version fields, not just their syntactic format.
No cross-tenant regression test suite that exercised manifest resolution across representative tenant configurations.
Auto-merge policies applied without manifest-type awareness, treating changes to tool manifests the same as changes to package.json.
No change isolation between shared base manifests and tenant-specific manifests, meaning a single base change could propagate to all tenants simultaneously with no staged rollout.

The Gating Strategy They Built

The team spent four weeks designing and implementing what they called the Manifest Integrity Gate (MIG). It operates as a mandatory CI/CD stage that runs before any merge to the main branch when changes touch the /manifests/ directory. Here is how each layer of the gate works.

Layer 1: Path-Scoped Renovate Configuration

The first fix was the most straightforward. The team updated their Renovate configuration to explicitly exclude the manifests directory from automated updates:

{
  "ignorePaths": [
    "manifests/**"
  ],
  "packageRules": [
    {
      "matchPaths": ["apps/**", "services/**", "infrastructure/**"],
      "automerge": true,
      "matchUpdateTypes": ["patch", "minor"]
    }
  ]
}

Tool manifest version fields are now managed exclusively through a dedicated internal CLI called manifest-bump, which requires explicit human intent and runs additional validation before writing any changes. Renovate is no longer permitted to touch the manifests directory under any circumstance.

Layer 2: Semantic Manifest Validator

The team built a custom validation step that goes beyond JSON Schema validation. The Semantic Manifest Validator (SMV) is a Python service that understands the internal semantics of tool manifest fields. When a manifest change is proposed, the SMV:

Loads the compatibility matrix for all declared sdkDependencies versions.
Checks each version field against the compatibility matrix to confirm it is compatible with the range of agent runtimes currently deployed across active tenants.
Flags any version bump that introduces a breaking change according to the SDK's published changelog, which is ingested automatically from an internal artifact registry.
Generates a human-readable compatibility report attached to the pull request as a comment.

If any incompatibility is detected, the SMV fails the CI stage with a detailed explanation of which tenants would be affected and what the behavioral delta would be. The PR cannot merge until the incompatibility is resolved or explicitly overridden with a documented justification.

Layer 3: Cross-Tenant Manifest Resolution Tests

The team built a synthetic tenant test suite called TenantMatrix. It maintains a set of representative tenant configurations, covering the full range of override patterns used by real tenants (anonymized and generalized). On every manifest change, TenantMatrix:

Resolves the merged manifest for each synthetic tenant using the proposed base manifest changes.
Runs a behavioral contract test suite against each resolved manifest, verifying that declared capabilities, output constraints, and tool invocation schemas behave as expected.
Compares the resolved manifests against a stored baseline snapshot and flags any unexpected diffs.

TenantMatrix runs in parallel across all synthetic tenant configurations, completing in under four minutes on average. It is the single most valuable layer of the gate because it catches the exact class of failure that caused the original incident: a base manifest change that passes unit tests but breaks specific tenant configurations.

Layer 4: Staged Rollout with Canary Tenant Promotion

Even with the above gates in place, the team recognized that synthetic tenants cannot perfectly represent every real tenant configuration. They implemented a canary promotion model for base manifest changes.

When a base manifest change merges to main, it is not immediately applied to all tenants. Instead:

Phase 1 (Day 0): The change is applied to a small set of internal and opted-in canary tenants. Automated telemetry monitors agent behavior, error rates, and output quality signals for 24 hours.
Phase 2 (Day 1-2): If Phase 1 telemetry is clean, the change rolls out to 20% of production tenants, grouped by similarity of configuration profile.
Phase 3 (Day 3+): Full rollout to all tenants, with automated rollback triggered if error rate thresholds are breached.

This staged model means that even if a corrupting change somehow passes all prior gates, its blast radius is limited and its detection is fast.

Layer 5: Manifest Change Ownership and Audit Trail

The final layer is organizational rather than technical. The team introduced a CODEOWNERS rule requiring that any change to /manifests/base/ must be approved by at least one member of the Platform Reliability team, in addition to standard peer review. Every manifest change is also logged to an immutable audit trail that records who approved the change, what the SMV compatibility report said, and which tenants were in scope during the rollout phases.

This audit trail has already proven its value: when a tenant raised a support ticket about unexpected behavior, the platform team was able to pull the full manifest change history for that tenant in under two minutes and pinpoint exactly which base manifest update correlated with the reported change in behavior.

Results: Three Months After the Gate Went Live

The Manifest Integrity Gate has been in production since late 2025 and the team published their internal retrospective in early 2026. The results are compelling:

Zero cross-tenant manifest corruption incidents since the gate went live.
14 potentially corrupting changes caught by the Semantic Manifest Validator before they reached main, all of which were Renovate-adjacent (engineers manually copying version strings from dependency files into manifest fields).
TenantMatrix caught 3 regressions that the SMV did not flag, because the incompatibility was behavioral rather than version-matrix-detectable.
Mean time to detect manifest-related issues dropped from 11 days (the average time for a tenant to notice and report subtle behavioral drift) to under 4 hours (caught by canary telemetry).
Developer confidence in manifest changes increased significantly, measured through internal team surveys, because engineers now have a clear, automated signal about the downstream impact of their changes before they merge.

The Broader Lesson: Automation Without Scope Awareness Is a Liability

The Orbis Labs incident is not really a story about a bad tool. Renovate is excellent software. It is a story about the assumption that automation is inherently safe when it stays within its defined parameters. The problem is that "defined parameters" in most CI/CD configurations are defined too broadly, and the consequences of that breadth are not always obvious until a specific combination of factors makes them catastrophic.

In multi-tenant platforms especially, the blast radius of a shared artifact change is multiplied by the number of tenants. A change that would be a minor inconvenience in a single-tenant system becomes a systemic incident when it propagates silently across dozens of customer environments simultaneously.

The key insight from this case study is that different classes of files require different classes of automation governance. Application dependency files and infrastructure-as-code and shared semantic manifests all live in the same repository, but they carry fundamentally different risk profiles. Treating them identically in your CI/CD pipeline is a latent bug waiting for the right conditions to activate.

Key Takeaways for Platform Engineering Teams

If you operate a multi-tenant platform with any form of shared configuration or manifest inheritance, the Orbis Labs story should prompt a direct audit of your own systems. Here are the most actionable takeaways:

Audit your Renovate or Dependabot glob patterns today. Confirm that automated dependency update tools are scoped exclusively to actual package dependency files, not all YAML or JSON files in your repository.
Distinguish between syntactic and semantic validation. A manifest that passes JSON Schema validation can still be semantically broken. Build validators that understand the meaning of your version fields, not just their format.
Build a cross-tenant regression suite, even with synthetic tenants. You do not need to test against real tenant data. A representative set of synthetic configurations covering your override pattern space is sufficient to catch the vast majority of cross-tenant regressions.
Never auto-merge changes to shared semantic artifacts. Auto-merge is appropriate for low-risk, well-scoped dependency bumps. It is not appropriate for anything that defines a behavioral contract shared across tenant boundaries.
Implement staged rollout for shared artifact changes. Even a two-phase canary rollout dramatically reduces blast radius and accelerates detection of issues that slip through static gates.

Conclusion

The Orbis Labs case study is a reminder that the most dangerous failures in modern platform engineering are not the ones that announce themselves with a red build or a crashed service. They are the ones that look like everything is working fine, right up until the moment a tenant notices that something is subtly, consistently wrong.

Building robust gating strategies around shared manifests is not glamorous work. It does not ship a feature or reduce latency. But it is exactly the kind of foundational reliability investment that separates platforms that scale gracefully from platforms that accumulate invisible technical debt until a single incident exposes years of compounding risk.

If your CI/CD pipeline is running automated dependency updates across a monorepo that contains shared semantic artifacts, the question is not whether you have this problem. The question is whether you have already found it.