AI strategy

Your LLM Cost Problem Is Actually a Strategy Problem (And Most Engineering Leaders Are Missing It)

Scott Miller

Mar 4, 2026 • 7 min read

I have enough context from my expertise to write this piece authoritatively. Here is the complete blog post: ---

Something significant shifted in the AI industry as we moved into 2026. The era of "grow at all costs, worry about margins later" is over. OpenAI, Anthropic, Google DeepMind, and virtually every major LLM provider have pivoted hard toward monetization. Pricing tiers are tightening. Enterprise contract structures are growing more complex. Rate limits are being weaponized as upsell levers. And the generous API pricing that made AI experimentation feel nearly free in the early days? That window is closing fast.

Engineering leaders across the industry have responded the way engineers naturally do: they framed it as an infrastructure problem. Optimize token usage. Cache aggressively. Compress prompts. Pick cheaper models for simpler tasks. These are all legitimate tactics, and you should absolutely be doing them. But if that is the entirety of your LLM cost strategy, you are solving the wrong problem at the wrong level of the organization.

The real issue is not your compute budget. It is your strategic exposure. And until model selection, contract architecture, and vendor lock-in risk sit in the same room as your infrastructure team, you are flying blind at exactly the moment the industry is making decisions that will shape your costs for the next three to five years.

Why the Monetization Pivot Changes Everything for Engineering Teams

To understand why this matters so urgently right now, consider what the AI provider landscape actually looked like eighteen months ago versus today. In 2024 and into early 2025, providers were effectively subsidizing enterprise adoption. Inference costs were dropping faster than providers could monetize them, and the competitive pressure between OpenAI, Anthropic, Google, Meta, Mistral, and a dozen others kept pricing aggressive. The incentive was market share, not margin.

That calculus has changed. By early 2026, the major providers have largely established their enterprise footholds. The next phase is extracting value from those relationships. That means enterprise contracts with volume commitments, tiered SLAs, model access gated behind higher pricing tiers, and fine-tuning capabilities bundled into premium packages that are difficult to unbundle. It means the model you built your product around last year may be deprecated, repriced, or quietly throttled in ways that force an upgrade conversation.

This is not cynicism. This is just how platform businesses mature. It happened with cloud infrastructure in the 2010s. It happened with SaaS tooling. It is happening with LLMs now, and the engineering leaders who recognize the pattern early will have significantly more leverage than those who notice it after they have already signed a three-year enterprise agreement.

The Three Conversations That Need to Happen Together

1. Model Selection Is a Business Risk Decision, Not Just a Performance Decision

When your team evaluates which LLM to use for a given workload, the conversation typically centers on benchmark performance, latency, and cost per million tokens. These are necessary inputs. They are not sufficient ones.

Model selection also determines which vendor's roadmap you are betting on, which deprecation schedule you are exposed to, and how much migration work you will absorb when that model is eventually sunset. GPT-4-class models have already gone through multiple deprecation cycles. Claude model generations turn over rapidly. Google's Gemini lineup has been reshuffled more than once. Every time a model you depend on is deprecated, you are paying an engineering tax that never appears in your token cost analysis.

The smarter framing is to treat model selection as a portfolio decision. Which workloads can tolerate open-weight models like Llama or Mistral variants, where you control the inference layer entirely? Which workloads genuinely require frontier closed-source capability, and are you pricing that dependency into your unit economics? Are you architecting your prompting and retrieval layers in ways that make model swaps feasible, or have you baked in assumptions that will make migration painful?

These are product and business questions dressed in engineering clothing. They belong in conversations that include your product leadership and your finance team, not just your ML platform team.

2. Contract Architecture Determines Your Leverage Window

Enterprise AI contracts in 2026 are not simple API agreements. They increasingly involve committed spend thresholds, reserved capacity arrangements, and multi-year terms with escalation clauses. Providers have strong incentives to lock you in early, and they are getting better at it.

The dangerous pattern is when engineering teams optimize for the best current price without adequately modeling the exit cost. A committed spend agreement that saves you 20% on token costs today might also mean you are paying for capacity you cannot use if your product direction shifts, or that you face punishing overage fees if adoption scales faster than projected. Worse, it may give your vendor significant leverage in the next pricing negotiation precisely because switching costs have become prohibitive.

What does good contract architecture actually look like? It means negotiating model continuity guarantees or migration support provisions before you sign, not after a deprecation notice arrives. It means understanding exactly what "enterprise support" covers when a model behaves unexpectedly in production. It means building in explicit renegotiation windows tied to usage milestones rather than calendar dates. And critically, it means having your legal and finance teams review these agreements with the same rigor they would apply to a major cloud infrastructure commitment, because that is effectively what these contracts are becoming.

3. Vendor Lock-In Risk Belongs on Your Architecture Review Board

The AI industry has a lock-in problem that is structurally different from traditional software vendor lock-in, and most engineering organizations have not updated their risk frameworks to account for it.

Traditional vendor lock-in is primarily about data portability and API compatibility. You can often mitigate it with abstraction layers and open standards. LLM lock-in has an additional dimension: behavioral lock-in. Your prompts, your fine-tuning investments, your retrieval architectures, and your evaluation frameworks are often calibrated to a specific model's quirks and capabilities. Switching models is not just a configuration change. It is frequently a re-evaluation and re-tuning exercise that can take weeks or months of engineering time.

This means the real cost of vendor lock-in in the LLM context is not just the switching fee. It is the engineering capacity consumed by migration, the regression risk during transition, and the opportunity cost of not building new features while your team is re-calibrating your AI layer. None of these costs show up in your token bill, which is exactly why they tend to be systematically underestimated.

The mitigation strategy is not to avoid frontier models. It is to architect deliberately. Teams that are winning this challenge in 2026 are doing a few things consistently: they maintain a provider-agnostic abstraction layer (using frameworks like LiteLLM or building their own routing layer) so that model swaps require configuration changes rather than code changes. They run at least one open-weight model in parallel for appropriate workloads, which gives them both a cost lever and a negotiating chip. And they treat their evaluation harness as a first-class asset, because a robust eval suite is what makes confident model migrations possible.

The Organizational Failure Mode to Avoid

Here is the pattern I see most often in engineering organizations right now: LLM costs are rising, so the problem gets handed to the infrastructure or platform team. That team does excellent work, implements caching, prompt compression, model routing for cheaper tiers, and reports back a meaningful cost reduction. Leadership celebrates. The problem is considered solved.

Six months later, a vendor reprices their enterprise tier. Or a model gets deprecated. Or the product team wants to expand into a new use case that the current contract does not cover efficiently. And suddenly the organization is negotiating from a position of weakness, with limited alternatives and limited time, because the strategic decisions were never made explicitly. They were deferred by default.

The fix is not complicated, but it does require intentionality. Engineering leadership needs to own the narrative that AI cost management is a cross-functional discipline. It requires infrastructure expertise, yes. But it also requires product strategy input on which capabilities are truly core versus commodity, finance involvement in contract modeling, and legal review of terms that have material business implications. The engineering team should be driving that conversation, not waiting for it to be called by someone else.

What Good Looks Like in 2026

The engineering organizations navigating this moment well share a few common traits. They have an explicit AI vendor strategy documented somewhere, even if it is a living document that changes quarterly. They know which providers they would migrate to for each workload category if their primary vendor doubled prices tomorrow. They have modeled that migration cost in terms of engineering weeks, not just dollars per token. And they have someone, whether a staff engineer, an AI platform lead, or a VP of Engineering, who owns that strategy and reviews it regularly against market developments.

They also tend to be running a deliberate mix of proprietary and open-weight models. Not because open-weight models are always better (they frequently are not for complex reasoning tasks), but because maintaining that capability in-house creates genuine optionality. It is the difference between negotiating with alternatives on the table and negotiating without them.

Finally, the best teams are treating their LLM evaluation infrastructure as a competitive moat. The ability to confidently evaluate a new model against your specific production workloads in days rather than weeks is what gives you the agility to respond to market changes. In a landscape where new capable models are still being released regularly and pricing dynamics can shift with a single blog post from a major provider, that agility is worth more than any single cost optimization tactic.

The Bottom Line

LLM cost optimization is a real and necessary discipline. Your infrastructure team should absolutely be doing the work of caching, routing, and prompt efficiency. But if that is where the conversation stops, you are optimizing the tactics while ignoring the strategy, and in a market that is actively repricing and restructuring around enterprise customers right now, that is an expensive mistake to make.

Model selection, contract architecture, and vendor lock-in risk are not infrastructure concerns. They are business continuity concerns. They belong in your engineering leadership conversations, your board-level risk discussions, and your product roadmap planning. The providers are thinking about your relationship with them at that level. You should be too.

The engineering leaders who will look back on 2026 as a year they got ahead of the curve are the ones who recognized this moment for what it is: not a cost problem to be optimized, but a strategic inflection point to be navigated. The tools are available. The frameworks exist. What is required now is the organizational will to treat AI vendor strategy with the same rigor we learned, eventually and often painfully, to apply to cloud infrastructure a decade ago.

Do not wait for the painful lesson. You have seen this movie before.