Uber burned through its entire 2026 artificial intelligence budget in five months. Microsoft has told thousands of engineers to stop using Anthropic’s Claude Code by June 30 and switch to an internal tool. Both stories landed in the same week, and together they mark the moment enterprise AI stopped being a savings story.
The original pitch was simple: deploy a coding assistant, replace some headcount, watch productivity climb. What is showing up on the invoice is the opposite. Token consumption at large engineering organizations is running two to three times higher than budget models assumed, while measurable output per engineer has barely moved. The honeymoon between big tech and generative AI vendors is ending on a spreadsheet.
The Bills Landing at Microsoft and Uber
Microsoft’s pivot is the more striking of the two. The company is one of Anthropic’s largest commercial customers through Azure, and it spent much of 2025 encouraging engineers to fold Claude Code into their daily workflow. That direction reversed quietly this month. Internal communications reviewed by trade press instruct engineering teams to migrate off Claude Code to a Microsoft-built alternative ahead of a June 30 cutover.
The official rationale has not been disclosed. People familiar with the decision point to one number: per-seat token spend on Claude Code has run far above what management modeled when the tool was rolled out, and the gains in shipped code have not closed the gap.
Uber’s numbers are public. The company gave roughly 5,000 engineers access to Claude Code in January. By May, chief technology officer Praveen Naga told staff the annual AI line item was already empty. Chief operating officer Andrew Macdonald reconfirmed the figure on the Rapid Response podcast this week.
We’re going to have to start talking about token consumption and the associated costs versus headcount. If you’re not actually able to draw a direct line to how much useful functionality you’re shipping to your users, that trade becomes harder to justify.
That was Macdonald, Uber’s COO, speaking on the podcast on Monday. The phrasing matters. “Versus headcount” is the line that frames every conversation enterprise CFOs are about to have with their engineering leaders.
How Token Pricing Broke the Math
For most of 2024 and 2025, the leading AI labs sold flat-rate access. A Claude or ChatGPT Pro seat at roughly $200 per month gave a developer essentially unlimited use. Internal estimates from cloud resellers put the true compute cost of a heavy user somewhere between $1,500 and $5,000 per month. The labs were eating the difference.
That subsidy is unwinding. On May 20, Google notified Gemini customers it was “introducing compute-based usage limits that factor in the complexity of your prompt, the features you use, and the length of your chat.” Anthropic has moved aggressively to metered pricing on Claude Code. OpenAI has tightened limits on its top tiers and is pushing enterprise customers toward API-style billing.
A token is the unit of measurement: roughly a fragment of a word for input or output. Under flat-rate billing it was invisible. Under metered billing it is the meter ticking on every prompt, every file the model reads, every line of code it generates.
Why Coding Agents Burn the Most
Coding assistants are the worst-case consumers in the new pricing model. A chat session asking a model to summarize a document might use a few thousand tokens. An agentic coding workflow that reads a repository, plans changes across files, runs tests, and iterates can consume five to thirty times more.
Rahul Garg, founder and chief executive of e-commerce firm Moglix, put numbers on the dynamic. “The driver is volume, not price,” he said. “Token prices have fallen nearly 80 per cent in the past year and close to 99 per cent over three years, yet enterprise AI spend has roughly tripled in the same window.”
The math is brutal in aggregate. Cheaper tokens encouraged broader rollouts. Broader rollouts encouraged more ambitious agentic workflows. More ambitious workflows consume vastly more tokens per task. The unit economics improved while the bill exploded.
The Microsoft Volume Problem
Trade press reports indicate the Microsoft engineering organization was hitting two to three times the token consumption that internal forecasts had set for routine coding tasks. Some of that is engineers learning the tool and over-prompting. Some is Claude Code’s design, which favors reading wide context before acting. Either way, the line item refused to flatten.
What Companies Are Actually Spending On
The cost picture differs sharply by AI workload type. The table below shows rough enterprise spend patterns across the three main buckets, drawn from vendor pricing pages and analyst commentary.
| Workload Type | Typical Token Use Per Task | Primary Cost Driver | 2026 Trajectory |
|---|---|---|---|
| Chat assistant (Q&A, drafting) | 2,000 to 10,000 | User seats | Manageable, predictable |
| Document analysis and RAG | 20,000 to 100,000 | Context window size | Rising with longer documents |
| Agentic coding (Claude Code, Cursor) | 200,000 to 2,000,000+ | Repository scope and iteration | Exploding; primary budget pressure |
The third row is where the budget overruns live. It is also the workload most directly pitched as a headcount substitute, which makes the productivity question harder to dodge.
The Productivity Number That Did Not Show Up
The case for spending heavily on coding agents rests on one assumption: engineers ship meaningfully more functionality per week with the tools than without them. The evidence collected so far is mixed at best.
A widely circulated study from METR earlier this year tested experienced open-source developers using AI coding assistants on real bugs in their own projects. The developers expected a 24 percent speedup. They reported feeling 20 percent faster. Measured against control, they were actually 19 percent slower. The gap between perceived and actual productivity is the gap that enterprise finance teams are now staring at.
Bryan Catanzaro, vice president of applied deep learning at Nvidia, put the comparison in the bluntest possible terms. For his team, he said recently, “the cost of compute is far beyond the costs of the employees.” When the company selling the picks and shovels is observing that compute outruns payroll, the framing has shifted.
The Recalibration Already Underway
Industry observers are careful to draw a line between budget shock and bust. Demand for AI tooling is still climbing. What is changing is how companies decide where to point it.
- Use-case triage. Vijay Gopalakrishnan, partner at Deloitte India, argues firms should pick specific business problems instead of distributing access broadly. “AI doesn’t have to be expensive for enterprises,” he said. “Costs can be managed when an AI solution is designed and operated effectively.”
- Outcome metrics over adoption metrics. Srividya Kannan, founder and chief executive of Avaali Solutions, said too many firms measured pilots, seats, and agents deployed rather than dollars saved or revenue unlocked. The next phase, she said, has to measure cost against business outcome.
- Treating AI as capex, not opex savings. Sunil Golani, head of cloud solutions at Ingram Micro, frames the shift bluntly: “The conversation needs to shift from cost to value. The real question is not how much AI will cost, but what enterprise value it can unlock.”
- Volume governance. Dhaval Radia, chief financial officer at Zeiss India, makes the distinction that may matter most. “AI is not becoming too expensive. Instead, unmanaged AI is becoming expensive.”
Former Infosys chief executive Vishal Sikka, who now runs his own AI startup, posted on X this week that “token costs are becoming a real issue” and that the industry should expect “more scrutiny of AI usage” through the rest of the year. Coming from someone with no incentive to talk down AI, the comment registered.
What the Vendors Do Next Matters Most
The question facing Anthropic, Google, and OpenAI is whether their largest customers will absorb metered pricing or push back hard enough to force tiered enterprise contracts that look more like cloud commitments than per-token billing. Microsoft’s move off Claude Code is the loudest signal so far that the answer might be “push back.”
There are three plausible vendor responses. The first is volume discounting steep enough to keep agentic workflows economic at enterprise scale, which compresses margins the labs are under pressure to defend. The second is technical work to reduce token consumption per task through better caching, smaller context windows, and cheaper model routing. The third is a slower, more painful one: letting customers right-size their deployments and absorbing the revenue impact.
Kaustubh Kashyap, founder and chief executive of AI startup Fexo GenAI, said the inflection is already visible. “Budget blowouts at Microsoft and Uber signal a shift. AI experimentation ends, reality sets in. The honeymoon ends and now ROI has to be part of the evaluation.”
The Frequently Asked Questions
Why is Microsoft moving its engineers off Claude Code?
Microsoft has not publicly stated a reason, but reports indicate token consumption on Claude Code is running well above internal forecasts without a matching gain in shipped code. Engineers have been directed to migrate to an internal Microsoft tool by June 30.
How did Uber exhaust its 2026 AI budget in five months?
Uber rolled out Claude Code to about 5,000 engineers in January. CTO Praveen Naga said the annual AI allocation was spent by May. COO Andrew Macdonald reconfirmed the number this week, citing token consumption rather than per-seat licensing as the driver.
What is a token and why does it matter for AI costs?
A token is the unit AI providers use to measure input and output, roughly a fragment of a word. Under flat-rate pricing it was invisible to customers. Under metered pricing, every prompt and every model response is billed by tokens consumed, which makes high-volume coding workflows dramatically more expensive than chat use.
Are AI tools actually making engineers more productive?
Evidence is mixed. A METR study found experienced developers were 19 percent slower with AI coding tools on real tasks, even though they felt 20 percent faster. Productivity gains exist for some workflows but are smaller and less universal than vendor marketing suggested.
Is it actually cheaper to hire a human than to deploy Claude?
For some workloads, increasingly yes. Nvidia’s Bryan Catanzaro said the cost of compute now exceeds employee cost for his team. Whether the comparison holds depends entirely on the specific task, the volume of token use, and whether the AI work would otherwise have required a hire at all.
What should enterprises do differently in the rest of 2026?
Industry advisers consistently recommend three moves: pick specific high-value use cases rather than broad rollouts, measure business outcomes rather than adoption metrics, and govern token volume actively through caching, routing, and model selection.
Will AI vendors lower their prices in response?
Probably through enterprise discounting and technical efficiency rather than headline price cuts. Anthropic, Google, and OpenAI are all under investor pressure to grow revenue, which makes a flat reversal to subsidized pricing unlikely. Negotiated commitments for large customers are the more probable path.
The honeymoon was real. The bill is also real. By the time the next round of enterprise contracts gets signed in the second half of 2026, the companies still spending heavily on AI will be the ones that learned to measure tokens against shipped functionality, not seats against headlines.





