Industry · 2026-06-26 · 7 min read

Tokenmaxxing Is Over: How OpenAI and Anthropic Face the Efficiency Era

Enterprise AI buyers are done burning tokens for sport. Lindy ripped out Claude for DeepSeek and watched its cost curve crash. Uber blew an annual AI budget in four months and slapped on $1,500 monthly caps. Microsoft, Amazon and Google are weaponizing cheap models. Here's what the spend crunch means for OpenAI, Anthropic, and your token bill.

TL;DR

The tokenmaxxing era is ending. Enterprises that spent two years rewarding "use more AI" are now installing budgets, caps and ROI gates.
Lindy moved 100% of traffic off Claude to DeepSeek and saw costs collapse. Uber burned an annual AI budget in four months and rolled out $1,500/month spend tiers per employee.
OpenAI ($25B run rate) and Anthropic ($47B run rate) both filed confidentially for IPO in early June — analysts say "now" is strategic because growth has nowhere to go but down.
Microsoft, Amazon and Google — also OpenAI/Anthropic's biggest backers — are shipping cheaper in-house models (Gemini 3.5 Flash at ~⅓ the price of frontier; Microsoft's low-cost suite; Amazon's Trainium-built models).
The new winning move isn't "buy more tokens." It's model routing, caching, batching, and smaller models for 80% of work. The Tokenscost dataset already tracks 300+ live price points across these alternatives.

The Vibe Shift That Killed "Spend at All Costs"

For two years, the deal was simple. Engineering leaders told their teams: *use as much AI as possible, don't worry about the bill, we'll figure it out at the next funding round.* That mindset built a $25B run-rate at OpenAI and a $47B run-rate at Anthropic in under three years — numbers that, as D.A. Davidson's Gil Luria told CNBC, are "the fastest they will ever be, which is mostly a matter of basic math."

Then the bills started landing on CFO desks.

Lindy — a San Francisco agent startup of ~25 people — was paying more for AI than for payroll. Founder Flo Crivello flipped 100% of traffic from Anthropic's Claude to DeepSeek's open-weight models earlier this month. His description of the result: *"you could see that cost curve go down, like, crash to the ground."* The expected savings: millions, within months.

Uber is the canonical enterprise case. CTO Praveen Neppalli Naga revealed in April that the company had blown its entire annual AI budget in four months — almost entirely on Claude Code. This month, Uber installed $1,500/month base spending tiers on AI tools, with employees required to request access to higher levels.

This is not isolated. Highspring's Jeff Henry told CNBC his clients are pulling back until they can prove ROI — some delaying decisions 12–18 months. Ramp CEO Eric Glyman: *"Most CFOs not only didn't plan for this in their annual plans — the steep growth — but don't have great tools to manage this."*

What the Numbers Actually Look Like

The growth-rate math is brutal. Doubling from $1B to $2B is easy. Doubling from $47B to $94B is a different planet.

Both companies filed confidentially for IPO in early June. The New York Times reported June 25 that OpenAI may push its listing to next year. Luria's read: *"There has to be some period of time in the future where there's some rationalizing of spend by companies, and that may be a blip ahead for Anthropic and OpenAI. That creates some sense of urgency to go public before we see that."*

Translation: list while the run-rate chart still goes vertical, because the deceleration is mathematical, not strategic.

The Three Forces Squeezing the Frontier Labs

1. Open-weight models hit "good enough"

DeepSeek, GLM-5.2, Llama, Qwen and the long tail of open models now do 80% of enterprise tasks for a tenth of the price. The Lindy switch isn't a one-off — it's a template. When the cheap model is *good enough*, frontier pricing has to justify itself per query, not per relationship.

2. Frontier-model usage on cheap tasks is unsustainable

Glean CEO Arvind Jain estimates ~95% of enterprise AI usage is still running on frontier models, even for tasks a smaller model could handle. AISquared's Darren Kimura calls this "absolutely" untenable. The fix is model routing — match the task to the cheapest model that can do it — which OpenAI explicitly identified as a structural threat earlier this month.

3. The backers are now the competitors

This is the most dangerous slide.

Microsoft — $13B+ into OpenAI — shipped a low-cost in-house model suite earlier this month and made GitHub Copilot auto-route to the cheapest viable model. Satya Nadella publicly: *"The last thing any of us want is a world where every company across every sector is ceding value to a few models that eat everything they see."*
Amazon — $5B+ into Anthropic — is building frontier-class models on its own Trainium chips at lower cost. Peter DeSantis: *"AI has a cost problem. If we ultimately want AI to transform everything, the costs have to be different."*
Google — launched Gemini 3.5 Flash at roughly ⅓ the price of comparable frontier models, and put it front-and-center at its developer conference.

PitchBook's Harrison Rolfes summed it up: *"Microsoft and Google have the infrastructure and capability — the entire stack — where they can come in and stiff-arm both OpenAI and Anthropic."*

What Smart Teams Are Actually Doing

The "spend crunch on AI" doesn't mean using less AI. It means using AI like a budget item instead of a vibe. The playbook that's working in mid-2026:

None of these are exotic. They are now table stakes. See the live cross-provider rates on the Pricing Table, the open-weight side of the market in the Providers Directory, and the savings stack in our batch API guide and caching guide.

What This Means for OpenAI and Anthropic

Both companies see it. OpenAI launched enterprise spend analytics and admin controls earlier this month — credit breakdowns by team, usage limits, budget visibility. Anthropic rolled out organization and per-user spending controls in August. These are the right products, shipped late.

The structural problem is harder. If routing succeeds, frontier models lose ~80% of their query volume even when they keep the customer relationship. The remaining 20% is high-margin reasoning work — exactly the work Microsoft, Google and Amazon are now targeting directly with their own frontier-tier models, subsidized by hyperscale margins.

The IPO clock is, as Battery Ventures' Dharmesh Thakker put it, partly a capital story: *"A lot of the traditional pockets of capital are drying up. All the institutional investors who can invest in these companies have already taken their pound of flesh."* But it's also a narrative clock. Once growth visibly decelerates, the trillion-dollar story becomes a hundred-billion-dollar story.

What This Means for Your Bill

Three concrete things changed this quarter:

1. The "just use Claude/GPT for everything" architecture is dead. It's a default that now has to be defended at every quarterly review.

2. The cheap end of the market got actually good. DeepSeek, GLM-5.2, Gemini Flash, Llama 4 — any of them can be a routed default for the easy 80% with no quality cliff.

3. Spend governance is now a product surface, not a finance afterthought. Per-user caps, per-team budgets, per-task model policies — these will be table-stakes on every enterprise AI plan within six months.

If your AI bill is growing faster than your AI-driven revenue, you're now the story this article is about. The fix isn't a better prompt. It's a routing policy, a caching layer, a batch queue, and an open-weight fallback — and you can ship all four in a sprint.

Bottom Line

Tokenmaxxing was the growth-at-all-costs era of enterprise AI: spend now, justify later. It funded OpenAI and Anthropic to near-trillion-dollar valuations. It also planted the seeds of its own end — by training a generation of CFOs that AI bills can quietly grow into payroll-sized line items overnight.

The efficiency era doesn't shrink AI. It professionalizes it. And the labs that thrive in it will be the ones that price like utilities, route like CDNs, and ship governance like SaaS — not the ones still selling frontier-tier tokens for every "hello world."

For the live cost data across OpenAI, Anthropic, Google, DeepSeek, Z.ai, Meta, Mistral, xAI and 60+ other providers, see the Pricing Table, the Providers Directory, and the GPU Pricing Comparison for the self-host side of the same shift.