06.12.2025 17:16 ● Author: Viacheslav Vasipenok

Unveiling the Hidden Heart of AI: What 100 Trillion Tokens Reveal About How We Really Use LLMs

In a field obsessed with leaderboard supremacy and billion-parameter behemoths, the true pulse of artificial intelligence often beats quietly in the shadows of everyday interactions.

A groundbreaking report from Andreessen Horowitz (a16z) and OpenRouter, released in December 2025, peels back those shadows by dissecting over 100 trillion tokens - prompts and completions - from billions of real-world LLM sessions on the OpenRouter platform. Spanning roughly two years up to November 2025, this dataset represents the largest empirical study of AI usage to date, drawn from anonymized metadata across a 13-month rolling window. What emerges isn't just data; it's a mirror to human creativity, ambition, and restlessness.

The biggest shocker? More than half of all open-source model activity - around 52% of OSS tokens - fuels roleplay and creative dialogues, where users craft immersive stories, embody fictional characters, and explore uncharted narratives. While the industry chases mathematical mastery, everyday innovators are scripting the next great saga with AI as their co-author.

The Creative Undercurrent: Roleplay Reigns Supreme in Open-Source Worlds

Forget the sterile math puzzles dominating conference keynotes. This analysis spotlights a vibrant underbelly: users aren't just querying facts; they're building worlds. In the open-source segment, roleplay commands 52% of token volume, with sub-themes like games and roleplaying gobbling up 58% of that slice, followed by writers' resources at 15.6% and even adult content at 15.4%.

These aren't fringe pursuits — they're the lifeblood of engagement, where flexible OSS models shine by sidestepping the heavy-handed moderation that plagues proprietary counterparts. Imagine a novelist brainstorming plot twists with an AI dungeon master, or gamers prototyping branching quests: such interactions triple average completion lengths to 400 tokens, fostering deeper immersion.

This creative surge underscores a broader truth: AI's mass appeal lies in augmentation, not automation. Proprietary models, while polished, often enforce content guardrails that stifle experimentation - think rejected prompts for "edgy" fanfiction. OSS liberates users, turning models into collaborative muses.

As one might infer from the data's mid-2025 categorization (via a sampled 0.25% of sessions tagged by advanced classifiers), these dialogues aren't ephemeral; they drive sustained usage, with prompt lengths ballooning fourfold to 6,000 tokens on average. It's a reminder that benchmarks measure peaks, but real adoption thrives in the valleys of imagination.

Code in the Driver's Seat: Programming's Explosive Growth and Claude's Iron Grip

Amid the storytelling frenzy, programming emerges as the undisputed growth engine, surging from 11% of total token volume in early 2025 to over 50% by November. Developers aren't dabbling; they're diving deep, with prompts routinely exceeding 20,000 input tokens for tasks like code generation, debugging, and full-stack scripting.

Sub-tags reveal the grit: 66% falls under general "Programming/Other," while 26% involves development tools, reflecting AI's role as an tireless pair programmer.

Here, Anthropic's Claude family asserts unchallenged dominance, processing more than 60% of all coding requests. This isn't luck - it's ecosystem lock-in. Claude's nuanced reasoning, especially in Sonnet variants, handles intricate logic flows that trip up rivals, earning loyalty in high-stakes workflows.

Token-wise, programming sessions now account for sequences over 5,400 tokens end-to-end, a tripling from prior norms, as users feed entire repositories into models for holistic refactoring.

Yet, OSS is nipping at heels: models like Qwen 2.5 Coder (32B parameters) have sparked a "medium model" renaissance (15-70B params), shifting usage from tiny tinkerers to scalable workhorses. The result? A coding boom that's not just quantitative but qualitative, accelerating software velocity in ways that echo the early days of GitHub Copilot - but on steroids.

China's Quiet Conquest: From Fringe to Force in a Fragmented Market

If roleplay humanizes AI, the ascent of Chinese models globalizes it. Their market share rocketed from a mere 1.2% in late 2024 to nearly 30% by year's end, propelled by relentless innovation from labs like DeepSeek, Alibaba's Qwen, and Moonshot AI's Kimi K2.

DeepSeek alone devoured 14.37 trillion tokens, outpacing Meta's LLaMA (3.96T) and Mistral (2.92T), while Qwen clocked 5.59T. These aren't copycats; they're disruptors, blending frontier capabilities with ruthless efficiency - think Kimi K2's spikes in multimodal tasks, eroding Western strongholds in translation (51% foreign language focus) and science (80.4% machine learning queries).

This surge flips the script on market structure. A year ago, DeepSeek monopolized over 50% of the OSS segment; today, fragmentation reigns supreme, with no single model claiming more than 25%. The top 10 OSS players - spanning DeepSeek, Qwen, LLaMA, Mistral, and others—collectively handle the lion's share, fostering a pluralistic ecosystem where users mix and match.

Chinese OSS, averaging 13% of total usage (peaking at 30% weekly), exemplifies this: rapid releases like DeepSeek V3 outiterate competitors, capturing niches from creative writing to agentic tools. It's a tale of democratization - open weights meet open ambition, diluting the old guard's grip.

Geographic Rebalancing: Asia's Double-Digit Leap in AI Dollars

Money talks, and in AI, it's whispering of a multipolar world. Asia's slice of global spending doubled from 13% to 31% over the study period, with China at 6.01%, Singapore leading at 9.21%, South Korea at 2.88%, and India at 1.62%. English still rules prompts at 82.87%, but Chinese (Simplified) climbed to 4.95%, Russian to 2.47%, and Spanish to 1.43%, mirroring usage flows.

North America, long the hegemon, slips to 47.22% - still dominant but under half for the first time. Europe holds steady at 21.32% (Germany 7.51%), while Oceania (1.18%), South America (1.21%), and Africa (0.46%) lag, hampered by infrastructure gaps. Billing-location proxies reveal a diffusion: enterprise accounts and third-party payments blur lines, but the trend is clear - Asia's compute investments and developer density are fueling a spending renaissance, from Bangkok's Thai-language bursts (1.03%) to Seoul's tech queries.

The Crystal Slipper Effect: Why First Impressions Forge Lasting Bonds

Enter the "crystal slipper effect," a coined phenomenon that elegantly captures AI's psychological quirks. Early "foundational cohorts"—users discovering a model's perfect fit for their workflow—boast 40% retention at month five, a stark contrast to the nomadic churn of later adopters. Take Gemini 2.5 Pro's June 2025 launch cohort: it locked in users for reasoning-heavy tasks, much like Cinderella's slipper sealing destiny. Claude 4 Sonnet's May 2025 group similarly thrives in coding marathons.

The flip side stings: models without an inaugural niche bleed users, even post-upgrades. Gemini 2.0 Flash and Llama 4 Maverick, lacking that spark, show flat, low retention across waves. DeepSeek bucks the trend with a "boomerang effect"—churn dips, then rebounds as word-of-mouth revives interest (e.g., R1's April 2025 cohort hitting peaks at month three). Lesson? Primacy trumps perfection: solving a user's pain point first creates indelible loyalty, turning fleeting trials into ecosystems.

Dollars and Paradoxes: Inelastic Cravings in a Bargain Basement

AI hunger defies economics' gravity. Demand proves wildly inelastic: a 10% price slash yields just 0.5-0.7% more usage, with premium flagships like Claude 3.7 Sonnet ($1.963 per million tokens) commanding volumes rivaling budget beasts like Gemini 2.0 Flash ($0.147/M). Users splurge for quality in clutch moments - debugging a deadline-crushing bug or ideating a blockbuster scene - while log-log plots show scant correlation between cost and consumption.

Cue the Jevons Paradox, that counterintuitive twist where cheaper resources spur greater overall use. Efficient models under $0.40/M tokens don't curb spending; they ignite it, weaving AI into ever-more tasks and contexts. OpenRouter's exclusions (bring-your-own-key setups) likely understate true savings via caching, but the pattern holds: as barriers drop, total tokens balloon, from 1.5K-prompt averages to 6K behemoths. It's efficiency begetting excess, much like cheaper coal once fueled industrial booms.

Agents and Pluralism: The Multi-Model Horizon Dawning

Synthesizing these threads, the report heralds a pivot: AI consumption morphing into agentic orchestration and multi-model mosaics. Tool calls - proxies for agentic flows - climb steadily (barring a May 2025 outlier), with Claude Sonnet and Gemini Flash leading invocations in dynamic, multi-step reasoning. Reasoning models now eclipse 50% of tokens, up from zero in Q1 2025, as agents outstrip human cognition in chained inferences.

Pluralism defines the path forward: no monolith reigns. Users orchestrate stacks - Anthropic for code (60%+), OSS for roleplay (52%), Chinese for speed - hitting an OSS-proprietary equilibrium at 30%.

Global decentralization amplifies this, with Asia's rise and OSS iteration (DeepSeek V3, Kimi K2) fragmenting frontiers. Retention via "crystal slipper" fits, coupled with agentic demands for tools and context, signals maturity: AI as a symphony, not a solo.

Of course, this vista comes with caveats - OpenRouter's lens omits direct hits to OpenAI, Anthropic, or Google, plus aggregators like Groq and Nebius. Enterprise silos and local runs lurk unseen, so temper the triumphs with a pinch of salt.

Yet, in 100 trillion tokens' glow, one truth crystallizes: the AI revolution isn't waged in labs, but in the stories we tell, the code we craft, and the worlds we wander together. As 2026 unfolds, expect agents to conduct this chorus, with users as the discerning maestros.

Unveiling the Hidden Heart of AI: What 100 Trillion Tokens Reveal About How We Really Use LLMs

The Creative Undercurrent: Roleplay Reigns Supreme in Open-Source Worlds

Code in the Driver's Seat: Programming's Explosive Growth and Claude's Iron Grip

China's Quiet Conquest: From Fringe to Force in a Fragmented Market

Geographic Rebalancing: Asia's Double-Digit Leap in AI Dollars

The Crystal Slipper Effect: Why First Impressions Forge Lasting Bonds

Dollars and Paradoxes: Inelastic Cravings in a Bargain Basement

Agents and Pluralism: The Multi-Model Horizon Dawning

Popular

The Anatomy of an Entrepreneur

What is a Startup?

Advertising on QUASA

8 Logo Design Tips for Small Businesses

Top 5 Tips to Make More Money as a Content Creator

Latest news