Are We in a Token Bubble?
The wrong question — and the Ingenuity Matrix, a better way to read the AI boom.
Preview
It would be useful to know the shape of future AI demand, and many are attempting to predict that. Since this is a long piece I’ll give you my predictions up front. My overall prediction is that localized corrections, from the imposition of usage controls and consistent pricing, will ultimately be less important than the big trends. Value, so far hard to measure, will become more clear, first through incremental gains at the core of software development, and next, from the innovation that takes longer to accumulate and organize.
Read on to learn how I add my experience in cloud computing and software engineering to my deep interest in economics to extend responses from two of my favorite writers. Along the way, I’ll recast the bubble analogy, explain recent trends that have hit the news, explain trends hidden deep in the development lifecycle, and provide a model, “Ingenuity Matrix”, for mapping usage intent to expected outcomes.
We love good stories, especially those with a villain. But we should be careful about our stories, knowing how powerful they can be.
Three stories have hit a crescendo at about the same time. Tokenmaxxing — companies turning token usage into a goal, metering it, and the waste that incentivizes. Subsidized tokens — questions on the relationship today between AI costs and pricing. And under both, the doubt about whether spending is producing value for AI customers.
Stack them together and a tidy narrative falls out. If these are what’s driving token usage, and they all adjust at once, the readjustment will ripple through AI industry economics — including Anthropic’s recently skyrocketing revenues. That narrative extends across all model providers, culminating as a cascading failure of the whole AI industry. Call it the token bubble, brought on by a revaluation of tokens and their utility.
It’s a neat story, but the framing is off, even before we get to evidence. “Bubble” as metaphor smuggles in two assumptions: that we’re looking at one structure, full of only hot air, and that it ends by popping. In reality, industrial bubbles deflate, running out of air. There is an inflated shell, inside of which a structure is being built, and its collapse while deflating halts construction within, and damages unfinished construction. But something remains.
Inflating the shell is not folly, but the simplest path to enable construction. It’s still calamitous when it deflates, but the goal is the structure, not the air. So the question I’m interested in isn’t “are we in a bubble?” It’s: which of these dynamics is air, which is structure, and how would you tell them apart?
The story has been covered by two of my favorite writers, Derek Thompson, in The AI Boom Has Entered Its ‘Wait, Is This Worth It?’ Era and Noah Smith, in How much more software do we really need?. Both play speculatively with the idea that spending and rationality may have split from each other, but retain optimism that something worthwhile is being built.
Thompson concludes his summary of an interview with SemiAnalysis’s Doug O’Laughlin:
Every new technology requires an extended period of trial and error, as organizations toggle between (a) not enough experimentation or spending, followed by (b) too much experimentation and spending, followed by (c) too dramatic a pullback, followed by (d) the repetition of steps (a) through (c), until firms figure out a long-term balance between labor spending and tech spending. Whether AI skeptics like Marcus are right that the bubble is about to pop depends entirely on a question that, as of today, nobody can definitively answer: Is the bill worth it?
Smith considers the period before a smarter than human in all ways artificial general intelligence:
But until we reach that point, it’s a nontrivial task to think of business models that could be fully automated even with an AI that can’t yet do everything. That’s going to be hard! If I had any good ideas for how to do that, I’d go become a billionaire myself.
At some point, though — maybe in the very near future — people (assisted by AI) will come up with those revolutionary new business models. At that point, tokenmaxxing will suddenly become a lot more economical, and Anthropic — or whoever has good coding agents by that time — will stand to make untold amounts of money.
These are good perspectives, but I can improve upon them to help understand the dynamics of AI usage. First, I’m from the software industry, which is at the center of the maelstrom — coding is now the single largest category of token usage. I can describe in more detail what developers are actually doing with these tokens, and their motivations. These details are important. Without them, a lot of valuable work remains mysterious, which invites doubts, such as “is this worth it”, or “do we need more software”?
Second, I’ve spent a while thinking about the adversarial dynamics of some AI usage, since first writing about it last year. Those dynamics are key to the questions both writers leave us with. Adversarial usage doesn’t produce the social value we all seek. It is not the only driver of AI usage, but when it is a driver, we should be asking, “is this worth it”?
Both writers are aware of an important detail, timing, which explains many misleading observations. With the addition of a deeper understanding of software development, and that model for separating zero-sum jockeying from the creation of social value, we can recognize events along the timeline with more accuracy.
Token usage, like human labor, can’t tell you progress. Its best analogy is effort. If you want to understand the effectiveness of effort, you want to know how it’s being applied. Different applications correlate with different outcomes. Since you can’t fast-forward to the results, this is the best immediate categorization you can add. I call this categorization, the Ingenuity Matrix, describing the scope and social alignment of token usage.
Some token usage goes nowhere by design, some burns down a backlog of long-deferred work, some is zero-sum jockeying. A slower, quieter share is the significant work that actually changes lives. Sort the usage that way and the “is it a bubble” question dissolves into a more useful one — what’s being built, what events can we expect along the path, and what risks and opportunities come with each set of events?
Background
Before we start into the model, understanding the two terms behind the narratives is useful. This will also be useful when reading general news on the topics. The narratives on these conflate multiple meanings, and smuggle assumptions. That ambiguity can support misleading narratives1.
What is tokenmaxxing?
Tokenmaxxing refers to two things. First, it refers to companies’ creation of “leaderboards” tracking employee AI usage by metering tokens. These leaderboards might be informal, but there’s often an implied assumption that high usage is rewarded, and low usage risks consequences. Sometimes that’s explicit. Ostensibly the justification is to incentivize experimentation and overcome inertia. In addition to simple inertia, many companies started with restrictive policies discouraging AI usage that they needed to counteract.
The second meaning focuses on what happens when leaderboards encourage AI usage, but do so in unproductive ways. Some employees respond by trying AI more and doubling down on things that work. But they may also create or continue unconstructive habits, for no reason other than they generate tokens. Individuals have described such practices anecdotally.
In this dual definition, when companies tokenmax, they encourage both the good and the bad. When individuals tokenmax, we talk only about the bad. The most extreme tokenmaxxing isn’t ingenuity that misfires — it’s intentional waste. The intent isn’t to do work; it’s to appear to have done work.
It’s not hard to see how that type of usage leads to a narrative that it’s all a sham. But we should remember, what we have is anecdotes. While it’s certain that some waste is occurring, it’s hard to gauge. Anecdotes are sparse, and for good reason. Admitting to it, would be admitting to willfully ignoring the employer’s best interest in productivity. That would carry consequences if a manager discovered it and wasn’t interested in joining the deception.
But separating waste from sincere-but-unsuccessful experimentation requires details that simply aren’t available at scale. What we can say is that the organizations running leaderboards are making a deliberate bet: they’re buying a pile of unaimed experimentation and some willful waste, in exchange for a fraction that matures into something real — durable skills, a useful tool, an opportunity nobody had time to chase before. Whether the bet pays off, only time will tell. But the structure of the bet — accepting near-term waste to fish for longer-term capability — is something we should predict and model as a mix.
What are subsidized tokens?
Subsidized tokens can refer to three things.
The most common usage focuses on two billing models. One is metered, usage is measured and billed per token, at prices like $5/million tokens. The other is by subscription, for example $20/month. Subscriptions typically have usage limits, but in most cases, fully utilizing a subscription’s limits yields a per-token cost below the metered rate. In addition, loopholes existed, allowing usage far below the metered rate. Users who used their subscriptions heavily enough to get that benefit were labelled as subsidized. That’s a simplification though, as it could be a lower profit margin, not subsidization.
The second usage focuses on free tiers. Free tiers have restrictive usage limits, but with no revenue, they are clearly subsidized. Free users heavily outnumber paid subscribers. Across providers there are at least a billion free tier users, while paid subscribers would be below a hundred million.
The third and final usage translates the unit economics of metered usage into the underlying costs that model providers pay to compute providers, which pay for chips, power, and other infrastructure. The question the subsidy narrative is really asking is, are the unit economics of AI usage sustainable? Or are they a short-term attempt to grow usage, the end of which results in higher prices, and pulling back from usage that’s no longer economic at the higher price point?
It’s an interesting story, but it’s almost worth ignoring. The efficiency of AI is increasing quickly, driving unit costs down. If prices rebound, unless the rebound is something like 10x, they’d soon fall again. The reason they can’t be ignored has little to do with a long-term trend, but everything to do with the short-term viability of the financing of AI investments and presumed valuations. A company trapped in subsidizing while a competitor is not, is going out of business quickly. This pattern repeats at each level of the AI value chain.
Why Bubble as an analogy is over extended
I said in the opening that using “bubble” as a metaphor for the AI industry smuggles in two assumptions. A bubble is so commonly used to analogize industrial revolutions, that we fail to reflect on the limits it has as an analogy. One mistake it leads us to, is the belief that there’s a soap bubble floating in air, and when we prick it with a pin it will pop, and evaporate. This does a poor job of explaining reality though.
We might limit our imagination more effectively by replacing the soap bubble with an inflatable dome. Whether this stays inflated depends upon the balance of air entering and exiting. Inside this dome, we’re constructing something durable, but it would be a challenge to do so with the dome weighing on top of us. We need the air to keep the dome’s ceiling from impeding our construction, and if it deflates it will probably ruin any half constructed structures. The stronger completed structures can sustain the weight of a deflated dome, but will struggle to conduct any additional construction.
If you want to think of the social support for a system, which supplies the air to keep the shell inflated, as a bubble, that’d be fair. This can evaporate with a bad news story, or some other form of social contagion. That social support is what replaces the air that leaks out. We’ll discuss the leaks later. Some are necessary, some are not. But replenishing the loss is unavoidable.
It’s useful to remember that in this analogy, deflation isn’t free. Something will remain, but the damage to unfinished construction is real. Careers are an obvious example of the consequences. When companies downsize the skills, connections and tacit knowledge built to support growth get stranded. If people move on, they may never come back. And besides, they are people and the disruption to their lives matters too. Projects also take a hit. Some projects may be zombies, shambling along with an unsound structure that will never be completed. But the forces of deflation aren’t so selective, and promising work is wiped away as well. Many projects that stop work during periods of tightening never start again.
A second flaw in the analogy is as a singular structure. Not only are there independent structures being built within, there’s not a single dome. There is a primary dome, where the model providers, GPU manufacturers and designers, and much else reside. But AI is also working to serve many different industries, and we shouldn’t assume a shared fate between all those efforts. We do want to pay attention to software development, because it represents such a large fraction of current usage. But software development itself isn’t an end of its own, it serves other industries. If AI is effective at helping some of those, and less-effective in others, this doesn’t establish a shared fate. It is only those cross-cutting effects that affect all software development that would carry that risk.
For the most part, those outside of software development aren’t going to understand those cross-cutting effects. I’ll highlight some of those details here, as they should be relevant to anyone interested in the immediate future implications of AI.
Why the public has a poor understanding of software development
The wider world has never shown broad interest in learning what software developers do. Compared to other professions like police, soldiers, doctors, lawyers, musicians, writers, journalists or even criminals. Without that interest it’s unlikely to learn the inner workings of the profession.
Media portrayals of software developers are rare and rarely accurate. The most common portrayal is the “hacker” who mysteriously takes control of computer systems in a few minutes with no preparation. Not only is that a poor representation of a real hacker, it tells you nothing about software development overall.


When software is “done”
If you come from outside the software world, you’d be excused from thinking of software development as building new software. In reality, this is a modest part of software development. Maintaining software, deploying software, and operating deployed software all represent larger segments than new software. All said, new software could be as small as 20%.
Noah makes a tentative argument that “The world may already have most of the traditional software that it needs.”. Noah’s aware he might be getting this wrong, and indeed he does. It does take an immense amount of work to keep sites running. AI is being used here, but it started later than its use to create new software. It’s not too hard to guess why. Creating new software is low risk comparatively. Like everyone, trust of AI has been a process. Software maintenance and operations themselves rely on significant “tech-stacks”, which have to be modified before you can even attempt to use AI to make a site more reliable in a meaningful way.
The number of software releases for security, operational, monitoring and development oriented features has been significant over the past year. Many use AI. Probably many others were built using AI.
Should you expect faster load times and higher reliability? First, would you really know? These have been improving for years, yet the general public rarely comments upon it. Mostly the only comments are those times when something does fail.
I’m not holding this data above as proof that AI has improved reliability. These improvements are more likely the result of conventional engineering, some started years before the results. The results that are “AI” based, are the result of the “machine learning” form that predated the architectures for Claude, Gemini and ChatGPT.
The point is, Noah (and you too) probably aren’t a sound judge of whether improvements are occurring unless you take the time to gather data. From my own knowledge, I know most AI based improvements are still in the early phases of adoption. But the average person shouldn’t expect to have an intuitive grasp on this. We’re bad intuitive judges of background effects like this, where we have to compare changes over time of non-continuous events. We can recall the last event, and the last change, but we are just as likely to draw a pattern from a recent reaction, than from an accurate history.
Why “tech-debt” comes first
You’ll find an interesting pattern that I’ll get into later. The first work to be done is the “shovel-ready” work. It’s easy to generate a prototype for some random idea, but rarer to have a great idea that can go from ideation to production quickly. AI does speed that up. But it doesn’t speed all work up.
With that in mind, provide a tool to a software developer, and they’ll have a long list of things they wanted to do, but haven’t had time for. Our general term for this is “tech-debt”, but realistically, it also includes half-baked feature ideas, or features that were sound but never made the cost-effectiveness cut. This list predictably contains a lot of non-amazing things. If they were amazing, they would have made the cost-effectiveness cut the first time. But AI does give you a reason to go deeper into that marginal backlog.
Security as a priority
I should also mention security here. Security is extremely important to the operation of software. Failures of security are nearly the worst thing you can imagine. This applies to all phases of software: development, deployment, and operations. It’s tempting to think of security as something you simply develop. But in reality that’s just the first step. A significant failure in development is likely to lead to a significant failure later, but it’s not destiny. You can layer protections to mitigate a development failure during operations. You have to do this because there are development failures you don’t know about. And more importantly, even a soundly designed and developed system can fail if not operated properly.
A lot of time and money is already spent on security. It’s never been the case that it hasn’t been a priority. You can find cases where it wasn’t a high enough priority. But it’d be a stretch to suggest there was a case no one cared. Whatever the priority, there is a limit, a cost-effectiveness barrier where one of the stages of development could have achieved more with more inputs. The introduction of AI changes the math on that barrier and makes many things practical that were impractical.
Security has another dimension too, which is that in addition to AI altering the developer’s cost-effectiveness equation, it does so for attackers too. This creates another incentive to burn down the security backlog. Security can’t wait. And so a lot with good cause, a lot of AI based productivity is going into security efforts.
This isn’t an effort that’s particularly visible to the outside world. What the outside world knows about it comes mostly from stories, not direct experience. When developers patch security holes, their intent almost always is to not change the user-experience. When that is the intent, it’s a slower process, because it requires educating users about new security mechanisms they need to participate in. Because that’s such a difficult thing to do, security teams have a very strong preference toward solving problems themselves without involving the users. It’s not always possible, but 90% of security efforts are invisible to users, and the next 9% are delivered as patches users see installed, but don’t pay any attention to.
The ingenuity matrix
I said in the opening that token usage is like effort: it tells you activity, not progress. To get from effort to expected outcome, you have to ask what the effort is for. Two questions do most of the work, and together they form a grid.
The first question is social alignment. Does the work create value the world didn’t have (positive-sum, pro-social)? Does it merely move value from one party to another (zero-sum, non-social)? Or does it destroy value — burn resources, or actively harm (negative-sum, anti-social)?
Alignment can be informed by our guesses of actors’ intent, but it’s not dependent on it. Our best bet is to act as an outside observer, guessing at outcomes. I don’t want to overcomplicate this though, this is estimation after all. Some significant pro-social value sometimes arrives from someone tinkering purely for fun. The social alignment is still recognizable from the outside, even when the actor wasn’t aiming at it.
The second question is scope, how far the work is reaching. Significant work aims at a real leap. Simple work aims at something bounded and modest. Naive work isn’t aimed at a productive outcome at all. Here “naive” describes the absence of a useful target, not the absence of a motive. Intentional waste is naive in this sense, it produces nothing of value, even though the person doing it has a very clear motive.
What you’ve just toured, security patches, reliability work, performance and cost tuning, is real value, almost all of it invisible to the people who benefit. Nearly all of it lands in a single cell: simple, positive-sum. It’s illustrative that so much of what is immediate is within simple or naive ingenuity. The first things individuals use AI for aren’t the significant ones. It’s the modest, shovel-ready, often-unseen things.
Map the rest against those two axes and you get an ingenuity matrix:
Negative-sum is not hypothetical, it connects back to security. The same drop in cost-of-effort that lets defenders finally burn down the security backlog also lowers the attacker’s cost. AI-assisted cybercrime is simple, negative-sum ingenuity, and the prospect of AI-scale biological or infrastructure attacks is the significant version. A large share of the invisible defensive work isn’t optional improvement, it’s the response to an adversary. The result is effort that is no longer avoidable, but also hidden, which delays the visible gains we’re watching for.
Significant non-social ingenuity ends empty. Non-social work can seem significant when under development. But one of two things happens. Either the work ends up leaking into pro-social, or anti-social accidentally, or it is copied and becomes trivial. Significance and neutrality are generally unstable.
Naive ingenuity is where the most visible tokens are burning right now and the least is being built. Failed experiments and aimless prototypes aren’t worthless — they build skills and occasionally surface something real, which is the option value the leaderboard bet was buying — but as a category they go nowhere by design. Because naive usage is so voluminous, and personal, it’s the most visible to the simplest forms of observation. That helps it dominate the “is this all a sham?” narrative.
Simple ingenuity has significant usage, but is quickly forgotten. The high volume usage is generally operationalized, contributing to security, reliability or operational efficiency. It’s soon forgotten, as it becomes a background effect. It doesn’t have the humorous, villainous story of tokenmaxxing. It doesn’t receive the personal promotion of the latest experiment.
One of the hallmarks of simple ingenuity, is it could be described as a backlog. The work may have been identified as desirable a long time ago, but with other competing priorities, it wasn’t prioritized. It may also not have been cost effective. One of the changes that AI brings is a change in cost-effectiveness. This activates this backlog, and you should expect early effects to burn this backlog down.
Simple ingenuity comes early and makes existing work more efficient. Sometimes this will show up as measurable revenues, but much is internal to companies. In that case it’s the token usage, the lower labor costs, or the higher quality that are the observations.
When AI enabled workers have a clear backlog, efficiency gains will flow into simple ingenuity to burn down the backlog. If the backlog results in priced or measured output, you’ll know.
Significant ingenuity will take longer to be identified, developed and deployed, especially the pro-social variety. The economy will reuse freed labor to create more value. That won’t happen immediately, as it may wait on hiring processes, training processes, or even the formation of new companies pursuing new products or business models.
Timing
While the development process is accelerated, the identification process retains most of its bottlenecks. Optimism may accelerate it. Idleness may accelerate it. But optimism and idleness may also flow into naive ingenuity, pursuing trivial goals without positive utility. There is a blurry area where naive ingenuity is experimentation. It may fail, but its failure may be necessary to build skills or discover significant opportunities.
At some point, a few things start to coincide. Naive and simple ingenuity will have built skills, ready to be exploited for realizing significant ingenuity. The backlog’s distraction fades as it burns down, and a new equilibrium raises the incentive to chase significant work — significance always carried more reward, but also more risk. But as cost-effectiveness decreases deeper into the backlog, avoiding risk becomes less attractive. All of these, in addition to the passage of time, predict a future wave of significant ingenuity that direct observation of measurements would fail to predict.
New output
Most of what we’d recognize as new output is significant, pro-social, and lagged. These are the life-changing things, and they’re the hardest to forecast. Your best guide might be a science-fiction novel, but of all the futures sci-fi writers have imagined, which do you bet on? Like flying cars, some things that look a step away stay out of reach far longer than expected.
It would be a mistake, though, to generalize from the failed predictions to all predictions. In many ways today’s information world already outruns older sci-fi imagination — the 1987 Star Trek: TNG depicted computers far beyond the 1966 version, and on the information front we’ve roughly met the standard it set for the 24th century already. The significant wave is hard to time and easy to underestimate at the same time.
What to expect
We should expect the AI industry to experience some pullbacks, then continue on. Whether this ever meets the bubble narrative is uncertain. I’m skeptical. Many pullbacks will be met by other accelerations. One experiment fails, another scales.
There isn’t one bubble, ready to pop, but multiple domes. Each industry, each set of users finds their value. While software remains so dominant, a failure in the software use case could be dramatic, but much of it is boring simple work that will continue to be automated for some time yet.
Much of the immediate term work is going to focus on the simplest, most invisible aspects. We shouldn’t discount the value there. Where it’s defensive, answering the negative-sum, like security, it has to be done. Where it’s part of more normal systems, it’s freeing resources, and developing skills and experience that will fuel more significant ingenuity in the future.
You do have to wait to see world changing effects. Software, as a model for implementing a workflow, will remain, and the general skills of software developers will be critical to this. Lines will blur, people will cross-over the lines, but ultimately the concept of software will continue to exist.
If the software dome does collapse, it will create structural damage, like all such events. Failed companies, layoffs, abandoned projects. Resources for naive experimentation would evaporate, and companies would proceed more cautiously. But a structure will remain. The burned-down backlogs that don’t un-burn, the skills that accumulated, the efficiency that keeps paying out, and the significant work just beginning to grow.
So, are we in a bubble? Will users and companies pull back on token usage, looking for value, discouraging wasteful tokenmaxxing? Will they react to pricing changes from model providers that close subscription loopholes that allow token usage in excess of what the same money would have bought per token via API? Yes, they will, but will that cause revenue drops that deflate the dome?
I don’t think so, there’s enough pending and developing work to fill the gap. Even if the significant ingenuity is still developing, the simple work is sufficiently valuable and important. But maybe those dynamics will return next year. If compute providers continue yet more expansions, they still might find them getting ahead of demand. There’s a lot of history to be written here. I’d just be careful about writing the ending first.
Sources
Representation of professions in entertainment media: Insights into frequency and sentiment trends through computational text analysis; Baruah S, Somandepalli K, Narayanan S..
State of AI, An Empirical 100 Trillion Token Study with OpenRouter; Malika Aubakirova, Alex Atallah, Chris Clark, Justin Summerville, Anjney Midha
The AI Boom Has Entered Its 'Wait, Is This Worth It?' Era; Derek Thompson
How much more software do we really need?; Noah Smith
Related Articles
As an example, Noah Smith quotes a commonly quoted study on tokenmaxing that claims diminishing returns to token usage, but presents data that should be interpreted as the opposite. In their description, they compare the number of tokens used to create PRs, and the costs of those tokens.
To evaluate whether that spend is worth it, we joined token usage data with actual developer output, measured in merged pull requests.
Over the course of Q1 2026, developers in the bottom 20% of token spend used only about three dollars’ worth of tokens for the entire quarter and shipped an average of 11 merged PRs. By comparison, developers in the top 20% spent $1,822 over the same period and shipped 23 merged PRs on average.
In other words, significantly higher token usage does lead to more output, but not proportionally. The cost per merged PR increases from just $0.28 in the lowest usage tier to $89.32 in the highest.
More tokens means more output, but at a much higher price per unit.
But if you’re comparing costs, the correct comparison would include developer time. If we take a conservative cost of $10,000 / month for a developer the calculation we get is:
Low token group: ($30,000 + $3.08) / 11 PRs ≈ $2,727/PR
High token group: ($30,000 + $2,054) / 23 PRs ≈ $1,393/PR
There is a sense in which you could use this data to describe diminishing returns, but it’s not in the realm of cost effectiveness. If someone proposed that development was accelerating exponentially in the way that token usage is, they’d be wrong. You cannot scale development at the speed of tokens because it is still dependent on developers.




