Why AI tokens will send your enterprise cloud bill sky-high again

redtokens-gettyimages-2219663429 — imaginima/ iStock / Getty Images Plus via Getty Images

Follow ZDNET: Add us as a preferred source on Google.

ZDNET’s key takeaways

AI usage is moving to token-based pricing.
Token pricing is far more expensive than the previous flat-fee model.
Measuring the value derived from AI remains an unsolved problem.

SAN DIEGO — A few months ago, most people paid a flat fee for their AI access. That was then. This is now. The days of AI pricing as a loss-leader are over. As everyone has discussed here at FinOps X 2026, AI’s token-based pricing model is becoming the foundation of the entire generative AI economy, and it’s far more expensive than older models. Just ask CoPilot users who are having fits over the new token-based pricing.

For many enterprise customers, this reminds them of the early days of cloud pricing when they had to deal with volatile invoices and business models shifting under their feet. Underneath the confusion, tokens are quietly standardizing how labs translate scarce GPU capacity into billable units, how enterprises measure AI usage, and how software vendors reprice their products.

Also: Rolling out AI agents? 4 ways to move fast and furious – but with extreme caution

Tokens: The atomic units of AI

In this new world, the token is the basic unit of AI work. J.R. Storment, executive director of the FinOps Foundation, calls it “the atomic unit of AI.” In his FinOps keynote, Storment said that “tokens serve more roles in the modern economy than almost any other commodity has in modern history, maybe, maybe oil in the 20th century.” Tokens, he told the FinOps X audience, are simultaneously “the unit of output from all of the hardware and compute and data centers,” “how the labs price their outputs and inputs,” and “the value unit that enterprises are looking to monetize.”

That abstraction is precisely why labs and hyperscalers like it. Instead of charging for GPU types, memory, and power directly, they can expose a single unit — tokens per million — over a bewildering mix of architectures and deployment topologies. OpenAI, Anthropic, Google, and others now publish per‑model rate cards with separate prices for input tokens (everything you send the model) and output tokens (everything it generates back), usually quoted in dollars per million tokens.

Also: Building an agentic AI strategy that pays off – without risking business failure

So what are tokens anyway? An AI token, said Storment, “is the smallest unit a word or phrase can be broken down into when being processed by a large language model (LLM).” Before a model can work with text, it breaks it into fragments, a process called tokenization. For English, a common rule of thumb is that “one token is roughly four characters, or about three-quarters of a word,” so “100 tokens ≈ 75 words.”

The token hides enormous complexity. As SAP‘s FinOps team put it in their session, “You pay per token, and this little token hides an enormous complexity underneath predictability,” from model choice and quantization to how aggressively you use caching or agents. That complexity is exactly what FinOps teams are now being asked to decode.

The all‑you‑can‑eat token era is over.

If 2023 through early 2025 was the era of cheap experiments, the last 18 months have been a rude awakening. Storment describes three distinct phases: The “old days of AI” before ChatGPT, the “good old days of AI” when chatbots “could write some decent code,” and then the post‑November‑2025 world when major model releases “took AI from pretty good to really good.”

In the good old days, the era of all-you-could-eat tokens and subscriptions, we went through a brief period of token maxing. Then everybody was excited about their token leaderboard, which showed who had the most token usage. Today, token leaderboards are painfully obsolete because no one can afford to waste tokens. As Amazon senior vice president Dave Treadwell begged, “Please don’t use AI just for the sake of using AI.”

Objectively, between June and November last year, Storment said global token usage grew in a “nice linear path.” Then those new models and agentic patterns landed. Context windows “went from a few thousand or tens of thousands or hundreds of thousands up to millions of tokens in a single conversation,” and “agentic hit the scene and exploded,” adding “loops and retries and corrections and all this insanity.”

Also: The autonomous business is coming. Here’s why that shift is good news for professionals

Companies had happily subsidized that behavior… until they saw the bills. Storment recounted how some “$200-a-month” power users actually cost “upwards of tens of thousands of dollars a month when you were running everything on the latest model.” For example, SemiAnalysis, an AI analytics company, recently estimated that a $200 Anthropic plan used to give $8,000 worth of Claude tokens, while a similar OpenAI offering gave $14,000 worth of Codex tokens.

Those days and prices are done. Moving forward, companies will have to pay the real cost of AI tokens.

“So now what matters more than anything is AI value,” Storment told the room. “We’ve got to bring value back to what we’re doing… We’re in an era where tokens are the main measurement. We’re in an era where tokens are in everything in software, and they’re driving a lot of the global token economy.”

Scarcity keeps token prices from collapsing

If Moore’s law and hyperscale competition were the only forces at work, you’d expect token prices to keep falling. To some extent, they have. “Since 2023, token prices have fallen dramatically,” Storment acknowledged. SAP’s internal telemetry tells a similar story. “This is our cost per token over the same time period,” said SAP data scientist Maida Nazifi, showing their internal chart. “It’s clearly trending down, even with a bit of flattening at the end. And honestly, it matches the narrative that everyone wants to believe, right? Token prices keep on falling.”

But both stress the caveat: The floor may be in sight. Storment notes that if “you look at the top labs and their pricing, you go back to the Wayback Machine. Token prices have been pretty flat since November 2025,” which he links directly to hardware and power constraints: “We can’t get enough hardware, we can’t get enough power… we’re seeing backlogs, we’re seeing long commitment periods, and we’re seeing shortages.”

Also: AI agents are getting their own search engine

He cited Intel’s CEO saying he doesn’t expect real relief in GPU and related component supply “until 2028.” Nazifi and SAP VP Frederik Pohl are seeing the same patterns at their company: Pohl warned, “We have supply chain constraints, we have hardware prices that are rising, and the prices of new frontier models are growing ever more expensive.”

The net result is a classic Jevons paradox: Falling unit cost, exploding total spend. “Even with falling token prices, we see that our spend is still rising, and that’s the famous paradox,” Pohl said. “At our scale, we had unit costs falling, but we saw in some months that spend was doubling.”

Storment thinks the paradox is just beginning. Goldman Sachs, he said, estimates global usage rising from “6 quadrillion tokens” today to “120 quadrillion forecasted tokens” within about 3.5 years. Even if token prices drop further once supply loosens, they are unlikely to fall 24x as fast as volume grows.”

FinOps discovers token economics

For the FinOps community, which cut its teeth on cloud right‑sizing and reserved instances, token pricing is both familiar and completely alien. The familiar part is that its usage‑based, the invoices are big, and forecasting is hard. The alien part? The unit is tied to language, not infrastructure, and it changes as fast as model releases, not as slowly as server depreciation schedules.

Pohl asserted that “AI does not just stretch the cloud playbook, it breaks it; AI is more different from the cloud than cloud was to the data center.” Unlike CPUs, “AI models are nothing like that… they have their unique strengths and weaknesses… They have different cost profiles, and swapping out an LLM is not just a pricing decision. It’s also a quality-of-output decision.”

SAP’s experience is a case study in how enterprises are retooling. Its Business AI platform, Pohl explained, runs across “multiple different LLMs,” including “ChatGPT, Anthropic, Gemini… other open source models,” layered on “different hyperscalers.”

Also: Work IQ is Microsoft’s big bet on agent-first enterprise IT, and I have questions

When SAP first went looking for AI cost data, “we immediately hit a wall,” Nazifi recalled. “The existing [cloud] tools were very blind to the nuance of LLMs, so they could tell us we spent this amount on [a provider], but not really which model, or how much the model. It really was like trying to optimize your gold mining operation by looking at the total weight of ore.”

So they did it the hard way: “We pulled data manually, we merged data across tables, and then we had this first picture by hand.” That picture, once it reached their global infrastructure lead and then the CTO, transformed the conversation. “Within days, it went from like, OK, this is interesting, keep me posted,’ to… ‘I need this regularly, I need more,'” Nazifi said. Pohl added the FinOps lesson: “If you have a CTO asking for a number, that’s not a question, it’s a mandate.

That demand forced SAP to formalize an internal AI FinOps framework built around three pillars:

Spend visibility: “What we consume, how we consume it, and where we consume it,” across models, platforms, business units, and regions.
Economics: “How efficiently are you leveraging AI,” measured with token‑level metrics like input/output ratios, cached token ratios, and “token to spend drift” to see whether costs are rising because of volume or mix shifts to pricier models.
Value: Connecting AI spend to business outcomes with “cost per use case” and “inference cost by revenue,” so they can tell “which AI features are economically viable” and whether “your AI product margins actually work.”

“Every token needs to earn its cost,” Pohl said, echoing Nvidia CEO Jensen Huang’s phrase “token factory effectiveness.” That factory spans everything from silicon and data center leases to model routing and prompt design.

Tokenomics: beyond just counting tokens

If FinOps is about cost control and accountability, tokenomics, at least as the Linux Foundation is positioning it, is about the full lifecycle of tokens as an economic good. Storment defines it as “the emerging discipline of converting energy and capital into AI tokens and resources, consuming those tokens and all the related technology to drive efficient intelligence, and then ultimately drive value on the backend.”

In his view, that breaks into three buckets:

Production: “Take energy and capital and create tokens,” whether in cloud data centers, colos, edge devices, or, as Elon Musk likes to imagine, “data centers in space.”
Consumption: All the allocation, forecasting, and optimization, which kind of sounds a lot like FinOps for AI,” spanning model routing, quantization choices, agent limits, and cache strategies.
Value: “How do we monetize those tokens? How do we adjust our pricing based on the cost of those tokens? What are the labor implications in our entire company based on the cost of that AI?”

That last piece is where token pricing directly collides with software-as-a-service (SaaS) business models. As Storment told me in an interview, “Tokenomics is getting over to the price of the tokens and how effectively we manage this production and consumption of them is changing pricing models for Fortune 100 companies.”

He points to Microsoft’s GitHub moves, shifting Copilot toward more explicit usage‑based charging, as an early example. Developers “who love the unlimited tokens” are now “really just angry at Microsoft,” because their implicit subsidy vanished.

Also: Why Anthropic suddenly pulled Fable 5 and Mythos 5 for everyone

The labs themselves are also tightening the screws in ways that are invisible at the token level. He raised as a fresh example Anthropic’s Fable model card: “If you’re going to use Claude at Fable to try to build an LLM, they will silently drop you to a different model, and you aren’t going to know.” Since then, Anthropic has walked back this policy, but other companies may not. Such silent policies make a mockery of any naive “cost per token” metric, because “not all tokens are created equal by any stretch of the imagination.”

Storment agrees. “A token can cost two cents per million, or it can cost 35 per million, just from a cost perspective,” he said, and even at the same rate, “one might drive a lot of value, and one doesn’t, based on how you’re using it.” For him, the point of embracing “tokenomics” as a term is to harness the fact that the C‑suite has already latched onto tokens as a mental model.

It also doesn’t help that today’s advanced LLMs, such as Anthropic Fable 5, can chase after an answer and burn tokens without users having a clue what’s actually happening. For instance, Simon Willison, co-creator of the Django Web framework, reported that “Based on a screenshot and a one-line prompt, Claude Fable 5 + Claude Code,” launched a web server, used numerous and different web browsers, built and launched its own web server, and performed many other tricks, all to track down a simple CSS display bug. Had he used token pricing, it would have cost him only $12. It’s easy to envision a frontier model taking on a more complex problem and burning hundreds or thousands of dollars.

Business models: from credits and seats to blended token bundles

These pricing experiments show a pricey future. Most customers will never see a line item labeled “120 quadrillion tokens.” Instead, vendors are building layers of abstraction on top:

Credits and opaque consumption: Storment described signing up for an unnamed service where “every time I ran a video, it was like, ‘Put more quarters in the machine, put your credit card down. These credits go fast.'” Under the hood, those quarters are tokens.
Hybrid subscription + usage: Others use “a basic monthly, and then some level of consumption,” giving customers a predictable base and then exposing them to token‑denominated overages at the margin.
Direct pass‑through models: A smaller set, especially in infrastructure‑adjacent products, are “starting to direct allocation, direct pass through,” essentially showing customers the token meter more honestly but wrapped in their own dashboards and guardrails.

These are all vulnerable to upstream shocks. Storment warned, “Anything changes in this, your token factory changes, you route to the wrong model and blow your cache up, you inefficiently forecast or estimate. Anything changes, this affects consumer pricing at the end, and you may have to change your prior pricing model for how you go to market, and this isn’t just software companies, it’s cascading into banks and everyone else today.”

Also: How this travel company’s AI rollout drove a 73% satisfaction boost: A 5-step playbook for your business

That cascading effect is why the Linux Foundation is spinning up a Tokenomics Foundation alongside the FinOps Foundation: to give big consumers and suppliers a vendor‑neutral place to hash out specifications and best practices for measuring and allocating token‑based costs. The FinOps Focus specification, originally designed to normalize cloud billing data, is already being extended for token‑level telemetry. A new “FinOps certified Focus generator” program aims to validate that providers’ billing pipelines conform.

The human side: AI haves versus have‑nots

Beyond the spreadsheets, token pricing is already shaping who gets to use powerful AI — and who doesn’t. Storment sees a “societal divide between those who can afford the AI and those who can’t” if high token costs persist. At the enterprise level, you can already see the outlines: “Certain teams are being deemed worthy of getting the latest model, and others are not,” with some users routed automatically “to cheaper model[s]” and others granted exceptions.

Yet there is also a strong argument against crude caps. One Fortune 100 executive told Storment to “look across your usage… and you’re going to find some outliers of people… Don’t cap them, don’t shut them down. Go talk to them, find out what they’re doing, because they might actually be doing something really interesting.” In a world where YC‑backed startups receive “millions of dollars of tokens” from frontier labs to disrupt incumbents, shutting down internal experimentation could be an existential threat.

Also: 5 ways to grow your business with AI – without leaving employees behind

For individuals, and especially new workers, trying to use AI, token pricing feeds into broader anxieties about AI and jobs. You raised the backlash to AI‑heavy commencement speeches and the sense among graduates that AI is “coming directly for their jobs in an already tough job market.”

Storment’s view is more nuanced but still stark: “I don’t think AI is immediately coming for everybody’s job, but I think the person who’s better at AI is coming for the job of the person who’s not using AI.” If token prices and quotas restrict who can learn and experiment, that divide will only deepen.

For both companies and individuals, we’re moving quickly into an AI-token-based economy. This, in turn, will lead to a far more expensive AI world. What all that will mean is a question we don’t yet have an answer to. The one thing we know for certain is that it will be orders of magnitude more expensive than it has been.

Source link

Bree Lambert

Subscribe to Our Newsletter

Get our latest articles delivered straight to your inbox. No spam, we promise.

This WD Blue SSD is almost 60% off at Best Buy – and I recommend it

Texas Parks & Wildlife (TPWD) Data Breach impacts 3 Million People

June 22, 2026

Anthropic’s Mythos AI broke into almost all NSA classified systems in hours

June 22, 2026

The Most Detailed Breakdown Yet of an Active Russian Credential-Harvesting Operation

June 22, 2026

Apple Preview is the most underrated Mac app. Here are 7 things you didn’t know you could do with it.

Texas Parks & Wildlife (TPWD) Data Breach impacts 3 Million People

June 22, 2026

Anthropic’s Mythos AI broke into almost all NSA classified systems in hours

June 22, 2026

The Most Detailed Breakdown Yet of an Active Russian Credential-Harvesting Operation

June 22, 2026

IKEA’s new smart bulbs are great, except for one thing

Texas Parks & Wildlife (TPWD) Data Breach impacts 3 Million People

June 22, 2026

Anthropic’s Mythos AI broke into almost all NSA classified systems in hours

June 22, 2026

The Most Detailed Breakdown Yet of an Active Russian Credential-Harvesting Operation

June 22, 2026

Recent Reviews

Apple Preview is the most underrated Mac app. Here are 7 things you didn’t know you could do with it.

Most Mac users see Apple Preview as only an app to view images, PDFs, and other documents. That’s it. If that sounds like you, you are leaving a lot on the table, because Preview has quietly grown into one of the most capable apps on macOS, and it’s available for free.

I use the app daily to edit images, markup and sign PDFs, redact information, and so much more. So let me walk you through seven things you probably didn’t know Apple Preview could handle.

You can rearrange, combine, and pull out PDF pages

If you regularly work with PDFs, this one will save you a ton of time. Preview lets you easily rearrange pages in PDFs, combine multiple PDFs into one, and even extract specific pages from a PDF.

To perform any of these actions, first you have to enable the thumbnail view. To do this, open a PDF file in Preview and go to View → Thumbnails or hit the keyboard shortcut ⌥⌘2 to reveal the sidebar. From here, you can click and drag pages to rearrange them in any order you like.

You can also drag a selected page out of the sidebar directly onto your desktop, and it will save those pages as a new PDF. No need for any extra software.

You can also drag a PDF document or pages from other PDFs inside another PDF to merge them.

Stop people from snooping on your PDFs

If you are sharing a sensitive PDF with someone and you don’t want anyone else to read it, you can lock it using Preview so only people with the correct password can open it.

To do this, open your PDF, click the info button in the toolbar, find the security lock icon under Permissions, and click the Edit button.

Now, check the box to require a password to open the document, set your password, and save the changes. You can even control what others can do without the password, like allowing them to print the file, but nothing else.

Another way to hide information is by redacting it. It permanently obscures the information so no one can read it. Note that once you save a redacted document, even you won’t be able to get the information back so ensure to create a copy of the original document before redacting it.

To redact a document, open the Markup toolbar and click on the Redact tool. Now, you can highlight any text or just select an area to redact it.

Read PDFs at night without burning your eyes

This one is a recent addition and an incredibly useful one. If you use your Mac in dark mode, Preview now has an option to match that for your PDFs. Go to View → Use Dark Appearance for PDF, and the blinding white background flips to a dark background that’s much easier on the eyes. Just keep in mind that this option only shows up when your Mac is already set to dark mode.

Remove image backgrounds without a third-party app

Preview also offers several image editing tools. Out of all the editing tools, my favorite is the one that lets me remove an image’s background. Yes, you don’t need Affinity or Photoshop to remove a background from an image.

Preview can do it. Open an image, go to Tools → Remove Background, or hit the keyboard shortcut ⌘⇧K. As you can see in the image below, Preview has done a great job of removing the background and cutting out the subject.

Open any image you just copied

Here is a little trick I use all the time. If you copy an image to your clipboard, you don’t need to paste it into a photo editing app to save it. Just open Preview and go to File → New from Clipboard or hit the keyboard shortcut ⌘N. Your copied image opens instantly, ready for you to edit, resize, or export.

Mark up screenshots and PDFs like a pro

The markup toolbar in Preview is genuinely great for quick edits. You can draw circles or rectangles to highlight something, add text, draw arrows, and even drop in your signature.

While CleanShot X handles all my screenshot annotation needs, Preview is the app I use to markup my PDFs. And if you don’t deal with dozens of screenshots every day, Preview’s built-in functionality will be more than enough for you.

Bonus tip: extract high-quality app icons

I don’t know who will need this feature, but I use it regularly, so I am sharing this as a bonus. Sometimes I need to use app icons to create images (like the one you see at the top of this article).

If you have the app already installed on your Mac, you don’t need to hunt for the icon image on the web. Just go to the Application folder in Finder, select the app, and copy it.

Now, launch Preview and use the “New from Clipboard” option, or use the ⌘N keyboard shortcut to open the app icon as an image in Preview. Now, use the ⌘S shortcut to save it to your desktop.

Apple Preview is more than just a viewer

The point is that Apple Preview is genuinely powerful, and it’s sitting right there on your Mac, completely free. Whether you are managing PDFs, editing images, or trying to keep a late-night reading session from blinding you, Preview has you covered. Give it a proper chance, and I think it will earn a permanent spot in your workflow.

Source link