
Follow ZDNET: Add us as a preferred source on Google.
ZDNET’s key takeaways
- AI usage is moving to token-based pricing.
- Token pricing is far more expensive than the previous flat-fee model.
- Measuring the value derived from AI remains an unsolved problem.
SAN DIEGO — A few months ago, most people paid a flat fee for their AI access. That was then. This is now. The days of AI pricing as a loss-leader are over. As everyone has discussed here at FinOps X 2026, AI’s token-based pricing model is becoming the foundation of the entire generative AI economy, and it’s far more expensive than older models. Just ask CoPilot users who are having fits over the new token-based pricing.
For many enterprise customers, this reminds them of the early days of cloud pricing when they had to deal with volatile invoices and business models shifting under their feet. Underneath the confusion, tokens are quietly standardizing how labs translate scarce GPU capacity into billable units, how enterprises measure AI usage, and how software vendors reprice their products.
Also: Rolling out AI agents? 4 ways to move fast and furious – but with extreme caution
Tokens: The atomic units of AI
In this new world, the token is the basic unit of AI work. J.R. Storment, executive director of the FinOps Foundation, calls it “the atomic unit of AI.” In his FinOps keynote, Storment said that “tokens serve more roles in the modern economy than almost any other commodity has in modern history, maybe, maybe oil in the 20th century.” Tokens, he told the FinOps X audience, are simultaneously “the unit of output from all of the hardware and compute and data centers,” “how the labs price their outputs and inputs,” and “the value unit that enterprises are looking to monetize.”
That abstraction is precisely why labs and hyperscalers like it. Instead of charging for GPU types, memory, and power directly, they can expose a single unit — tokens per million — over a bewildering mix of architectures and deployment topologies. OpenAI, Anthropic, Google, and others now publish per‑model rate cards with separate prices for input tokens (everything you send the model) and output tokens (everything it generates back), usually quoted in dollars per million tokens.
Also: Building an agentic AI strategy that pays off – without risking business failure
So what are tokens anyway? An AI token, said Storment, “is the smallest unit a word or phrase can be broken down into when being processed by a large language model (LLM).” Before a model can work with text, it breaks it into fragments, a process called tokenization. For English, a common rule of thumb is that “one token is roughly four characters, or about three-quarters of a word,” so “100 tokens ≈ 75 words.”
The token hides enormous complexity. As SAP‘s FinOps team put it in their session, “You pay per token, and this little token hides an enormous complexity underneath predictability,” from model choice and quantization to how aggressively you use caching or agents. That complexity is exactly what FinOps teams are now being asked to decode.
The all‑you‑can‑eat token era is over.
If 2023 through early 2025 was the era of cheap experiments, the last 18 months have been a rude awakening. Storment describes three distinct phases: The “old days of AI” before ChatGPT, the “good old days of AI” when chatbots “could write some decent code,” and then the post‑November‑2025 world when major model releases “took AI from pretty good to really good.”
In the good old days, the era of all-you-could-eat tokens and subscriptions, we went through a brief period of token maxing. Then everybody was excited about their token leaderboard, which showed who had the most token usage. Today, token leaderboards are painfully obsolete because no one can afford to waste tokens. As Amazon senior vice president Dave Treadwell begged, “Please don’t use AI just for the sake of using AI.”
Objectively, between June and November last year, Storment said global token usage grew in a “nice linear path.” Then those new models and agentic patterns landed. Context windows “went from a few thousand or tens of thousands or hundreds of thousands up to millions of tokens in a single conversation,” and “agentic hit the scene and exploded,” adding “loops and retries and corrections and all this insanity.”
Also: The autonomous business is coming. Here’s why that shift is good news for professionals
Companies had happily subsidized that behavior… until they saw the bills. Storment recounted how some “$200-a-month” power users actually cost “upwards of tens of thousands of dollars a month when you were running everything on the latest model.” For example, SemiAnalysis, an AI analytics company, recently estimated that a $200 Anthropic plan used to give $8,000 worth of Claude tokens, while a similar OpenAI offering gave $14,000 worth of Codex tokens.
Those days and prices are done. Moving forward, companies will have to pay the real cost of AI tokens.
“So now what matters more than anything is AI value,” Storment told the room. “We’ve got to bring value back to what we’re doing… We’re in an era where tokens are the main measurement. We’re in an era where tokens are in everything in software, and they’re driving a lot of the global token economy.”
Scarcity keeps token prices from collapsing
If Moore’s law and hyperscale competition were the only forces at work, you’d expect token prices to keep falling. To some extent, they have. “Since 2023, token prices have fallen dramatically,” Storment acknowledged. SAP’s internal telemetry tells a similar story. “This is our cost per token over the same time period,” said SAP data scientist Maida Nazifi, showing their internal chart. “It’s clearly trending down, even with a bit of flattening at the end. And honestly, it matches the narrative that everyone wants to believe, right? Token prices keep on falling.”
But both stress the caveat: The floor may be in sight. Storment notes that if “you look at the top labs and their pricing, you go back to the Wayback Machine. Token prices have been pretty flat since November 2025,” which he links directly to hardware and power constraints: “We can’t get enough hardware, we can’t get enough power… we’re seeing backlogs, we’re seeing long commitment periods, and we’re seeing shortages.”
Also: AI agents are getting their own search engine
He cited Intel’s CEO saying he doesn’t expect real relief in GPU and related component supply “until 2028.” Nazifi and SAP VP Frederik Pohl are seeing the same patterns at their company: Pohl warned, “We have supply chain constraints, we have hardware prices that are rising, and the prices of new frontier models are growing ever more expensive.”
The net result is a classic Jevons paradox: Falling unit cost, exploding total spend. “Even with falling token prices, we see that our spend is still rising, and that’s the famous paradox,” Pohl said. “At our scale, we had unit costs falling, but we saw in some months that spend was doubling.”
Storment thinks the paradox is just beginning. Goldman Sachs, he said, estimates global usage rising from “6 quadrillion tokens” today to “120 quadrillion forecasted tokens” within about 3.5 years. Even if token prices drop further once supply loosens, they are unlikely to fall 24x as fast as volume grows.”
FinOps discovers token economics
For the FinOps community, which cut its teeth on cloud right‑sizing and reserved instances, token pricing is both familiar and completely alien. The familiar part is that its usage‑based, the invoices are big, and forecasting is hard. The alien part? The unit is tied to language, not infrastructure, and it changes as fast as model releases, not as slowly as server depreciation schedules.
Pohl asserted that “AI does not just stretch the cloud playbook, it breaks it; AI is more different from the cloud than cloud was to the data center.” Unlike CPUs, “AI models are nothing like that… they have their unique strengths and weaknesses… They have different cost profiles, and swapping out an LLM is not just a pricing decision. It’s also a quality-of-output decision.”
SAP’s experience is a case study in how enterprises are retooling. Its Business AI platform, Pohl explained, runs across “multiple different LLMs,” including “ChatGPT, Anthropic, Gemini… other open source models,” layered on “different hyperscalers.”
Also: Work IQ is Microsoft’s big bet on agent-first enterprise IT, and I have questions
When SAP first went looking for AI cost data, “we immediately hit a wall,” Nazifi recalled. “The existing [cloud] tools were very blind to the nuance of LLMs, so they could tell us we spent this amount on [a provider], but not really which model, or how much the model. It really was like trying to optimize your gold mining operation by looking at the total weight of ore.”
So they did it the hard way: “We pulled data manually, we merged data across tables, and then we had this first picture by hand.” That picture, once it reached their global infrastructure lead and then the CTO, transformed the conversation. “Within days, it went from like, OK, this is interesting, keep me posted,’ to… ‘I need this regularly, I need more,'” Nazifi said. Pohl added the FinOps lesson: “If you have a CTO asking for a number, that’s not a question, it’s a mandate.
That demand forced SAP to formalize an internal AI FinOps framework built around three pillars:
- Spend visibility: “What we consume, how we consume it, and where we consume it,” across models, platforms, business units, and regions.
- Economics: “How efficiently are you leveraging AI,” measured with token‑level metrics like input/output ratios, cached token ratios, and “token to spend drift” to see whether costs are rising because of volume or mix shifts to pricier models.
- Value: Connecting AI spend to business outcomes with “cost per use case” and “inference cost by revenue,” so they can tell “which AI features are economically viable” and whether “your AI product margins actually work.”
“Every token needs to earn its cost,” Pohl said, echoing Nvidia CEO Jensen Huang’s phrase “token factory effectiveness.” That factory spans everything from silicon and data center leases to model routing and prompt design.
Tokenomics: beyond just counting tokens
If FinOps is about cost control and accountability, tokenomics, at least as the Linux Foundation is positioning it, is about the full lifecycle of tokens as an economic good. Storment defines it as “the emerging discipline of converting energy and capital into AI tokens and resources, consuming those tokens and all the related technology to drive efficient intelligence, and then ultimately drive value on the backend.”
In his view, that breaks into three buckets:
- Production: “Take energy and capital and create tokens,” whether in cloud data centers, colos, edge devices, or, as Elon Musk likes to imagine, “data centers in space.”
- Consumption: All the allocation, forecasting, and optimization, which kind of sounds a lot like FinOps for AI,” spanning model routing, quantization choices, agent limits, and cache strategies.
- Value: “How do we monetize those tokens? How do we adjust our pricing based on the cost of those tokens? What are the labor implications in our entire company based on the cost of that AI?”
That last piece is where token pricing directly collides with software-as-a-service (SaaS) business models. As Storment told me in an interview, “Tokenomics is getting over to the price of the tokens and how effectively we manage this production and consumption of them is changing pricing models for Fortune 100 companies.”
He points to Microsoft’s GitHub moves, shifting Copilot toward more explicit usage‑based charging, as an early example. Developers “who love the unlimited tokens” are now “really just angry at Microsoft,” because their implicit subsidy vanished.
Also: Why Anthropic suddenly pulled Fable 5 and Mythos 5 for everyone
The labs themselves are also tightening the screws in ways that are invisible at the token level. He raised as a fresh example Anthropic’s Fable model card: “If you’re going to use Claude at Fable to try to build an LLM, they will silently drop you to a different model, and you aren’t going to know.” Since then, Anthropic has walked back this policy, but other companies may not. Such silent policies make a mockery of any naive “cost per token” metric, because “not all tokens are created equal by any stretch of the imagination.”
Storment agrees. “A token can cost two cents per million, or it can cost 35 per million, just from a cost perspective,” he said, and even at the same rate, “one might drive a lot of value, and one doesn’t, based on how you’re using it.” For him, the point of embracing “tokenomics” as a term is to harness the fact that the C‑suite has already latched onto tokens as a mental model.
It also doesn’t help that today’s advanced LLMs, such as Anthropic Fable 5, can chase after an answer and burn tokens without users having a clue what’s actually happening. For instance, Simon Willison, co-creator of the Django Web framework, reported that “Based on a screenshot and a one-line prompt, Claude Fable 5 + Claude Code,” launched a web server, used numerous and different web browsers, built and launched its own web server, and performed many other tricks, all to track down a simple CSS display bug. Had he used token pricing, it would have cost him only $12. It’s easy to envision a frontier model taking on a more complex problem and burning hundreds or thousands of dollars.
Business models: from credits and seats to blended token bundles
These pricing experiments show a pricey future. Most customers will never see a line item labeled “120 quadrillion tokens.” Instead, vendors are building layers of abstraction on top:
- Credits and opaque consumption: Storment described signing up for an unnamed service where “every time I ran a video, it was like, ‘Put more quarters in the machine, put your credit card down. These credits go fast.'” Under the hood, those quarters are tokens.
- Hybrid subscription + usage: Others use “a basic monthly, and then some level of consumption,” giving customers a predictable base and then exposing them to token‑denominated overages at the margin.
- Direct pass‑through models: A smaller set, especially in infrastructure‑adjacent products, are “starting to direct allocation, direct pass through,” essentially showing customers the token meter more honestly but wrapped in their own dashboards and guardrails.
These are all vulnerable to upstream shocks. Storment warned, “Anything changes in this, your token factory changes, you route to the wrong model and blow your cache up, you inefficiently forecast or estimate. Anything changes, this affects consumer pricing at the end, and you may have to change your prior pricing model for how you go to market, and this isn’t just software companies, it’s cascading into banks and everyone else today.”
That cascading effect is why the Linux Foundation is spinning up a Tokenomics Foundation alongside the FinOps Foundation: to give big consumers and suppliers a vendor‑neutral place to hash out specifications and best practices for measuring and allocating token‑based costs. The FinOps Focus specification, originally designed to normalize cloud billing data, is already being extended for token‑level telemetry. A new “FinOps certified Focus generator” program aims to validate that providers’ billing pipelines conform.
The human side: AI haves versus have‑nots
Beyond the spreadsheets, token pricing is already shaping who gets to use powerful AI — and who doesn’t. Storment sees a “societal divide between those who can afford the AI and those who can’t” if high token costs persist. At the enterprise level, you can already see the outlines: “Certain teams are being deemed worthy of getting the latest model, and others are not,” with some users routed automatically “to cheaper model[s]” and others granted exceptions.
Yet there is also a strong argument against crude caps. One Fortune 100 executive told Storment to “look across your usage… and you’re going to find some outliers of people… Don’t cap them, don’t shut them down. Go talk to them, find out what they’re doing, because they might actually be doing something really interesting.” In a world where YC‑backed startups receive “millions of dollars of tokens” from frontier labs to disrupt incumbents, shutting down internal experimentation could be an existential threat.
Also: 5 ways to grow your business with AI – without leaving employees behind
For individuals, and especially new workers, trying to use AI, token pricing feeds into broader anxieties about AI and jobs. You raised the backlash to AI‑heavy commencement speeches and the sense among graduates that AI is “coming directly for their jobs in an already tough job market.”
Storment’s view is more nuanced but still stark: “I don’t think AI is immediately coming for everybody’s job, but I think the person who’s better at AI is coming for the job of the person who’s not using AI.” If token prices and quotas restrict who can learn and experiment, that divide will only deepen.
For both companies and individuals, we’re moving quickly into an AI-token-based economy. This, in turn, will lead to a far more expensive AI world. What all that will mean is a question we don’t yet have an answer to. The one thing we know for certain is that it will be orders of magnitude more expensive than it has been.















