On 26 June, CNBC put a name to the thing every AI budget owner already felt: the end of tokenmaxxing. For two years the incentive inside companies was to push as many tokens through the models as possible and sort out the results later. That era closed the moment the bills arrived.
Uber said it burned its entire annual AI budget in four months, then capped some tools at tiered spend starting around $1,500 a month per user. Sam Altman told enterprise buyers on 3 June that cost had become "a huge issue," and mentioned that OpenAI's largest internal user burns roughly 100 billion tokens a month. Meta capped internal token spend as its bill approached the billions. Both OpenAI and Anthropic filed confidentially for IPO the same month. The story reads like a spending problem. It is a pricing problem, and the pricing problem is a strategy problem.
Here is the thesis. The price of a token stopped being a line in your infrastructure budget and became the thing that decides your business model. If you sell software by the seat while your product runs agents on the customer's behalf, every heavy user is now a loss. You are covering their token bill out of your own margin. The company that reprices around the work keeps the market. The company that keeps charging by the seat funds its own decline.
The bill came due
The cost structure changed shape, and most people priced against the old shape.
A demo is a chatbot: one prompt, one answer, a few thousand tokens. Production is an agent: it plans, calls tools, reads back its own output, retries, and checks its work. Gartner put the gap at 5 to 30 times the tokens per task. Nobody sees that in the pilot, because the pilot never runs the loop at volume. They see it in month four, when the loop has run a million times and the annual budget is gone.
The reactions in June were all cost defense. Lindy moved its whole product off Claude to the cheaper open-weight DeepSeek, expecting millions in savings. On 4 July, Palantir and Nvidia pitched an air-gapped stack explicitly on the premise that token billing is cracking enterprise budgets. These are rational moves. They are also floor moves: they make the same work cost less. None of them changes what you can charge for the work. That is the part that decides who survives the next year, and almost nobody was talking about it.
You can't price the work by the seat
A seat priced access. Access had a marginal cost of roughly zero, so a flat fee per login worked for twenty years. An agent has a marginal cost that scales with how hard it works. The moment your product does work instead of granting access, the seat stops describing what you sell.
The market is repricing in real time. In Metronome's catalog of pricing models, seat-based fell from 21% to 15% of companies in a year, while usage and hybrid models climbed from 27% to 41%. Bessemer calls it "the AI pricing pivot"; a16z frames the endpoint as outcome-based pricing. The cleanest example ships today: Intercom's Fin agent charges $0.99 per resolved conversation. No resolution, no charge. The customer pays for the work, not for the ability to attempt it.
Outcome pricing is hard. You have to define the outcome, measure it, and eat the cost of the attempts that miss. But it is the only model where your revenue and your token cost move in the same direction. Every other model asks you to bet that your customers won't use the thing you built.
The seat is a subsidy
This is the part that inverts the last decade of SaaS instinct.
Under per-seat pricing, your best customer was your heaviest user. High usage meant stickiness, a case study, and the obvious upsell. The marginal cost of that engagement was near zero, so more usage was pure signal.
Flip on an agent behind the same flat seat and the sign reverses. The heavy user now drives a token cost that climbs with every workflow they automate. Somewhere along that curve it crosses your fixed price, and past the crossing your most engaged customers are the ones losing you money. You cannot upsell your way out, because the thing you would upsell costs you more to deliver. The seat has quietly become a subsidy, and the customers you were proudest of are the ones drawing it down fastest.
I wrote a few weeks ago that the company is the product: the unit of sale is moving from software to the work itself. The token bill is that thesis arriving as an invoice. Once the work is what you deliver, the seat was always a proxy for it, and the proxy just broke in public.
Routing is the floor, repricing is the ceiling
There are two responses to the bill, and they are not the same move.
The first is to cut the cost of the tokens: route cheap models for the easy 80% of steps and reserve the frontier model for the step that actually needs it, cache aggressively, run open weights where quality allows, buy the air-gapped stack. This is necessary. It is also the same argument I made about efficiency being the floor. It makes today's product cheaper to run. It does not change what you can charge, and within a quarter your competitors route too, so the relative advantage erases itself.
The second response is to reprice around the outcome, so the token becomes a cost of goods sold inside a margin you control. That is a different kind of work. It forces you to know what a unit of value is worth to the customer, to measure whether you delivered it, and to carry the variance when you don't. It is harder, slower, and it is the only one of the two that builds a business instead of defending an old one. Routing buys you time. Repricing is what you are supposed to do with the time.
What this looks like at Shakers
A marketplace was outcome-priced before agents existed, which turns out to be an accident of good timing.
Shakers gets paid when a match becomes a project — when a client and a freelancer actually start work — not per login and not per token. So when Matchmaking or ShakersAI spends tokens to produce a match, that spend already sits inside the margin of a placed project by construction. There is no seat to hide the cost behind. The token is a cost of goods on a unit we already sell by outcome, which means the discipline is forced rather than chosen.
That reframes the daily engineering decisions. Tokens are a line in the cost of a match, not a line in an innovation budget, so the question on every agent step is whether it changes the match enough to earn the model it runs on. The frontier model runs on the step that moves the decision; the cheap model runs the rest; the eval decides which is which. When a step can't show it improves the match, it doesn't get the expensive model, and often it doesn't get to run at all.
The general version, for anyone running AI in production: put the token cost inside the unit you sell, priced by its outcome, owned by the team whose margin it moves. Do that and the bill stops being a surprise, because it was never a separate budget — it was always the cost of the thing you were already charging for.
Closing
The token bill was read in June as a spending scare, a sign the AI trade had gotten ahead of itself. That reading is too small. What changed is that the price of a token became legible all the way up to the business model, and it exposed which companies were selling access and which were selling work.
If you sell access, you can route and cache and postpone the reckoning, but the seat is leaking margin to your heaviest users every day you wait. If you sell work, you price the work, you carry the cost of the misses, and the token bill is just the cost of goods on something you already knew how to charge for.
You can price a login by the seat. You can't price the work by the seat. The companies that internalize that keep their margins. The rest will spend the next year discovering that their best customers were the most expensive ones all along.