In the early days of the internet, navigating the web felt like flipping through a phone book. Back in the 1990s, directories like Yahoo! curated lists of websites, and you could theoretically browse "everything" online because the digital universe was small — fewer than a million sites by 1997, according to estimates from the Internet Archive.
Fast-forward to today, and that quaint notion has evaporated. The web now sprawls across billions of pages, with no master catalog in sight. As one old joke puts it: "I have access to the entire internet — all the sites." "You mean all the sites Google has reached?"
The punchline hits home because it's true. Most of us don't "explore" the web; we query search engines like Google, follow recommendations from friends (who likely found those sites via search), or revisit bookmarked pages. Direct discovery is rare, and the vast majority of online content remains hidden unless a crawler — those automated bots that scour the internet — indexes it.
This reality came into sharp focus in November 2025 when Matthew Prince, CEO of Cloudflare (the infrastructure giant that powers much of the web and occasionally causes widespread outages when it falters), posed a provocative riddle on X: "Any guesses how much more of the web Googlebot sees versus GPTBot (OpenAI), Bingbot (Microsoft), Claudebot (Anthropic), Meta-External Agent (Meta)?"
The answer, revealed in a January 2026 interview on the TBPN podcast, underscores Google's unparalleled reach. Prince disclosed that for every webpage OpenAI's GPTBot accesses, Googlebot sees 3.2 pages.
The gap widens to 4.8 times for Microsoft's Bingbot and Anthropic's Claudebot, with other AI crawlers lagging even further behind. "What I worry about is, because Google has this unique access to the web that nobody else has, the game might just go to them," Prince said. "Because at the end of the day, whoever has the most data wins in the era of AI." These figures stem from Cloudflare's analysis of traffic across its network, which handles a significant portion of global internet requests.
To put this in concrete terms, Cloudflare's 2025 Year in Review report analyzed successful HTML requests from major crawlers in October and November 2025. Googlebot reached 11.6% of unique webpages in their sample—more than three times the 3.6% accessed by GPTBot.
Bingbot clocked in at 2.6%, while Claudebot and Meta's crawler each hit 2.4%. Perplexity's bot, by comparison, scraped a mere 0.06%. Overall, Googlebot accounted for over 25% of all verified bot traffic on Cloudflare's platform throughout 2025, dwarfing OpenAI's GPTBot at 7.5% and Microsoft's Bingbot at 6%.
This dominance isn't new; Google has been honing its crawling infrastructure for over two decades, starting with the original Googlebot in 1998. But in the AI era, where models like Gemini, GPT-5, and Claude 4 hunger for vast datasets to train on, this edge could prove decisive.
Why the disparity? It boils down to trust, scale, and history. Website owners often whitelist Googlebot because appearing in Google Search drives traffic and revenue — unlike AI crawlers, which primarily extract data for training without sending much back. Cloudflare data highlights this imbalance through "crawl-to-refer" ratios: the number of pages a bot scrapes versus the referrals it generates.
In 2025, Anthropic's Claudebot topped the charts with ratios as high as 73,000:1, meaning it consumed far more content than it linked users to. OpenAI's GPTBot spiked to around 3,700:1 in March, while Google's search bot maintained a more balanced 14:1.
This "take more, give less" dynamic has sparked backlash, with over 80% of AI crawling in the past year focused on training rather than search or user actions. Many publishers now block AI bots via robots.txt files, further limiting their access — Cloudflare noted AI crawlers as the most frequently disallowed user agents in 2025.
Yet, not all data is created equal. The "long tail" of the web — the billions of obscure, low-traffic pages Googlebot reaches—often includes spam, outdated info, or niche content with sparse information density. As Prince noted, it's unclear how critical this tail is for AI training.
High-quality datasets, like those from books, academic papers, or licensed sources, might outweigh sheer volume. OpenAI, for instance, has partnerships with publishers and focuses on curated data, while Anthropic emphasizes safety-aligned training. Still, Google's archive includes not just current snapshots but historical scrapes spanning 25+ years, giving it a temporal depth others lack.
The broader AI landscape reflects this scramble for data. Crawler traffic surged 18% from May 2024 to May 2025, with GPTBot's requests exploding 305% and Googlebot up 96%. Meta's External Agent entered the fray at 19% of AI crawler share, while ByteSpider (from ByteDance) plummeted from 42% to 7%. Meanwhile, enterprise A
I spending shows Anthropic overtaking OpenAI, claiming 40% of LLM budgets in 2025 (up from 12% in 2023), thanks to tools like Claude Code. But consumer traffic tells a different story: ChatGPT still dominates with 64.5% share as of January 2026, though Gemini has climbed to 21.5%.
Looking ahead, Prince's call for either curbing Google's leverage or equalizing data access echoes ongoing debates. Regulators, including the U.S. DOJ, have scrutinized Google's search monopoly, noting its double advantage in AI. Initiatives like open data protocols or synthetic data generation could level the field, but for now, Google's blue, toothy bot — often depicted as a fierce, energy-wreathed guardian of the web — remains the undisputed king of the crawl.
As the web grows ever larger and more fragmented, the question isn't just what we can access, but what the machines behind our queries can see — and how that shapes the intelligence they build.
Also read:
- Monero's Privacy Revolution: FCMP++ Ushers in the Largest Anonymity Set in Crypto History
- Another Company from the Future: Aletta – Revolutionizing Blood Draws with AI-Powered Robotics
- Google Research Drops MedGemma 1.5: A Major Leap in Open Medical Multimodal AI, Paired with Breakthrough MedASR Speech Recognition
Thank you!

