The Real Cost of AI Inference: Subsidies, Chips, and Whether the "Golden Age" Will Last

A familiar company recently audited a massive legacy codebase for a client. Last time, the entire workflow—including heavy use of AI coding assistants — fit comfortably inside a $200 monthly subscription.

The Real Cost of AI Inference: Subsidies, Chips, and Whether the "Golden Age" Will Last This time, with usage-based pricing (like GitHub Copilot charging per token or similar shifts), the same work is projected to cost around $2,000. What changed? Not the code, but the economics behind the models.

This isn't an isolated anecdote. It's a window into one of the most important (and least discussed) dynamics in AI right now: the gap between what consumers and power users actually pay and what it truly costs to run inference at scale. Heavy users are getting enormous subsidies, and the sustainability of that model is questionable as companies eye IPOs and profitability.

The Subsidy Experiment: How Much Are You Really Getting?

SemiAnalysis recently bought every major subscription tier from Anthropic (Claude) and OpenAI (ChatGPT) and stress-tested them with long-horizon coding and agentic tasks until weekly limits were hit.

The results were eye-opening:

Claude plans: A $20/month plan delivered roughly $400 worth of tokens at API-equivalent pricing. The $100 plan hit around $2,000. The top $200 plan reached about $8,000.
OpenAI/ChatGPT plans: Even more generous for heavy users — up to $700, $3,500, and as high as $14,000 for the respective tiers (roughly 40-70x the subscription price at full utilization).

These aren't theoretical numbers; they come from real, sustained usage that mimics professional developer workflows. Most casual users never come close to these limits, which is why the economics work on average (like a buffet where light eaters subsidize heavy ones). But for power users—exactly the people and companies driving real productivity gains — the effective discount is massive.

Assuming high gross margins on API usage (around 75% as a benchmark in the analysis), subscription margins look far worse at high utilization. The labs are effectively giving away compute to retain users and build habits while the technology matures.

What Actually Drives Inference Costs?

Inference — the process of running a trained model to generate outputs—is the dominant ongoing expense in AI, often 55-80% of total GPU spend in production environments.

Key cost drivers include:

Hardware (AI Chips): NVIDIA still dominates with GPUs like H100, H200, and Blackwell-series chips optimized for inference. These deliver massive improvements in tokens per watt and cost per token. Prices per equivalent performance have plummeted, but absolute hardware costs remain high (tens to hundreds of thousands per server). Newer architectures and competitors are pushing efficiency gains, but supply, power delivery, and cooling add complexity.
Energy and Data Centers: Running frontier models at scale consumes enormous electricity. Data center buildouts are capital-intensive.
Model Size and Optimization: Larger models cost more per token, but techniques like quantization (reducing precision), speculative decoding, caching, and better token efficiency dramatically lower real-world costs. Frontier models that cost $20+ per million tokens a couple of years ago now have equivalents at fractions of that price.

Overall, per-token costs for equivalent intelligence have collapsed dramatically (hundreds of times cheaper in some cases), but absolute spending by labs remains huge because usage is exploding. Companies like OpenAI have reported massive inference-related losses and are projecting continued heavy burn (e.g., billions annually) even as revenue grows.

Margins: Consumer vs. Enterprise Reality

Consumer subscriptions are heavily subsidized, especially for power users. This is a deliberate strategy to acquire users, gather data/feedback, and maintain mindshare. It's funded by enormous venture capital and strategic investments (Microsoft for OpenAI, Amazon/Google for Anthropic, etc.).

Enterprise/API customers face much closer-to-cost (or profitable) pricing. Large commitments often come with volume discounts, reserved capacity, or dedicated infrastructure, but they pay closer to the true marginal cost of compute. This is where the real margins live for the labs.

The math explains the tension: If a lab has ~75% gross margins on API tokens, maxed-out subscriptions can flip to deeply negative margins. Average utilization across all subscribers keeps the overall business afloat for now.

The IPO Pressure and the End of Easy Subsidies?

The Real Cost of AI Inference: Subsidies, Chips, and Whether the "Golden Age" Will Last Both OpenAI and Anthropic are reportedly preparing for public market debuts. Investors eventually demand profits, not just growth and market share.

This creates strong incentives to:

Gate the newest, most capable models behind higher-priced API tiers or enterprise plans.
Introduce stricter limits or "nerfing" on consumer subscriptions (though backlash risk is high).
Raise prices or shift more usage to pay-per-token models.
Focus on high-margin enterprise deals.

We've already seen early signs: moves toward usage-based pricing in tools like Copilot, quota adjustments, and experiments with feature gating. The "free" or ultra-cheap intelligence era for heavy users may be peaking.

That said, the underlying trend of falling inference costs (thanks to better chips, software optimizations, and scale) continues. Labs can profitably serve increasingly powerful models at lower prices over time—just not necessarily at the current subsidy levels for unlimited heavy use.

Historical Parallels: Railroads, Fiber, and AI Infrastructure

The Real Cost of AI Inference: Subsidies, Chips, and Whether the "Golden Age" Will Last This situation echoes past infrastructure booms. During the railroad expansion and the dot-com fiber optic buildout, companies overbuilt capacity, many went bankrupt, and investors lost fortunes. Yet society ended up with durable, transformative infrastructure that enabled decades of growth.

In AI, we're in a similar phase of massive capital deployment into chips, data centers, and models. There will likely be consolidation, failures, and shakeouts among providers.

But the "rails" (compute infrastructure, efficient models, and tooling) will remain—and improve.

Open-source models, self-hosting options, and specialized inference providers are already offering cheaper alternatives for many workloads.

Will We Keep the Current Level of Access?

The Real Cost of AI Inference: Subsidies, Chips, and Whether the "Golden Age" Will Last Probably not exactly as it is today for heavy professional use. Expect more tiering: generous but capped consumer plans, premium "unlimited" options at higher prices, and robust enterprise offerings.

The best frontier models may become relatively more expensive or restricted for casual/heavy individual use, while overall intelligence gets cheaper and more accessible through efficiency gains and competition (including from open models and non-Western providers).

Power users and companies will adapt by optimizing workflows (caching, smaller models for simpler tasks, agent orchestration, self-hosting where it makes sense), or paying more for guaranteed access. The $200 "all-you-can-eat" golden era for intensive coding/agentic work is likely transitional.

We're living through an extraordinary period of subsidized intelligence that accelerates experimentation and adoption.

It won't last in its current form, but the infrastructure being built will power productivity gains for years to come. The question isn't whether AI gets more expensive—it's how quickly costs fall relative to capabilities, and who captures the value.riced right and used wisely, this technology still represents one of the biggest leverage opportunities in history. The subsidies bought us time to figure it out. Now the real economics are coming into focus.