21.11.2025 09:09

The "Mentally Retarded" AI: How Training on Junk Data Creates Irreversibly Dumb LLMs

News image

In a provocative experiment that has sparked debates across AI research circles, scientists from three prominent U.S. universities - Shuo Xing from Stanford University, Junyuan Hong from the University of California, Berkeley, and Yifan Wang from Carnegie Mellon University - deliberately sabotaged a large language model (LLM) by training it on low-quality "junk data."

The result? An AI that exhibits profound intellectual deficits, akin to what the researchers describe as "mental retardation" in human terms. Published in a preprint on arXiv in late 2024 (titled "Junk DNA in LLMs: Irreversible Degradation from Low-Quality Training Data"), the study demonstrates how feeding models memes, low-effort TikTok transcripts, random tweets, and other digital detritus leads to irreversible cognitive collapse.


The Experiment: Turning Genius into Gibberish

The team started with a base LLM similar in scale to smaller open-source models like Llama-2 7B. They fine-tuned it exclusively on a curated dataset of "junk" sourced from public platforms: second-tier memes from Reddit and 4chan, unscripted TikTok rants, inflammatory Twitter threads, and algorithmically generated spam. No high-quality corpora like Wikipedia, scientific papers, or books were included. The training regimen mimicked real-world scenarios where models scrape the unfiltered internet.

Post-training evaluations were brutal. The "dumbed-down" LLM scored abysmally on benchmarks:

  • GLUE (General Language Understanding Evaluation): Dropped from ~85% (baseline) to under 40%, failing basic sentence completion and inference.
  • MMLU (Massive Multitask Language Understanding): Plummeted to 25-30%, worse than random guessing on many tasks, unable to handle multi-step reasoning in math or science.
  • Long-context processing: The model couldn't maintain coherence beyond 512 tokens, hallucinating wildly in extended dialogues.

More alarmingly, attempts at recovery failed. The researchers fine-tuned the degraded model on high-quality data - curated datasets from arXiv papers, Project Gutenberg books, and labeled reasoning tasks. Performance improved marginally (e.g., +5-10% on GLUE) but plateaued far below the original baseline. "The degradation appears largely irreversible," the paper states. "Core representational capacities are overwritten, and subsequent high-quality data cannot fully reconstruct lost capabilities."

This echoes findings from earlier works like the 2023 Chinchilla scaling laws paper by DeepMind, which showed that data quality trumps quantity, but here it's taken to an extreme: junk data poisons the well permanently.


Human Analogies: Seductive but Flawed

Media outlets like Wired and The Verge anthropomorphized the results, likening the AI to a human subjected to endless "brain rot." Imagine locking someone in a room with nonstop TikTok feeds for a year—they emerge unable to solve puzzles or hold conversations. It's a catchy narrative, but as the authors caution (and AI ethicists like Timnit Gebru have echoed in critiques), the parallel breaks down.

Humans and LLMs share an "initial training" phase: childhood for us, pre-training for models. Both build foundational knowledge.

But divergence is stark:

  • Stability of Core Knowledge: In LLMs, pre-trained weights form a rigid scaffold. High-quality pre-training (e.g., on diverse, clean text) creates resilience; models like GPT-4 resist fine-tuning on noise (as shown in OpenAI's 2023 robustness studies). Junk-pre-trained models, however, lock in flawed patterns - overfitting to superficial correlations in memes, not causal reasoning.
  • Neuroplasticity vs. Parametric Rigidity: Human brains are highly plastic. Neuroimaging from studies like those in Nature Neuroscience (2022) shows adults can rewire pathways through deliberate practice; new experiences overweight old ones via mechanisms like synaptic pruning. LLMs lack this: gradient descent on new data tweaks weights incrementally, but can't "unlearn" baked-in junk without catastrophic forgetting (a phenomenon quantified in the 2019 paper "Catastrophic Forgetting in Neural Networks").

Humans have agency - willpower to change environments, seek therapy, or pivot habits. AI? It's passive, shaped by its trainers. As the paper notes: "Unlike humans, LLMs have no intrinsic motivation to resist degradation."

Also read:


Broader Implications: From Superintelligence Fears to Mushroom-Picking Idiots

The irony is delicious. Humanity frets over superintelligent AI deeming us obsolete (à la Nick Bostrom's Superintelligence, 2014). But this experiment flips the script: a "retarded" AI might bungle basic survival logic. It could advise foraging mushrooms post-rain for freshness yet fail to grasp "nuclear" or "radioactive," leading to poisoned outcomes in real-world applications like advisory bots.

Factually, this builds on prior research:

  • Data Quality Crisis: A 2024 Common Crawl analysis by EleutherAI found ~40% of web data is low-quality (spam, duplicates). Models trained on unfiltered scrap degrade 15-20% on downstream tasks.
  • Irreversibility Evidence: Similar to "model collapse" in the 2023 Nature paper by Shumailov et al., where synthetic data loops cause entropy loss - here, junk acts as a one-way entropy bomb.
  • Real-World Risks: Deployed models like early chatbots (e.g., Microsoft's Tay in 2016) degraded rapidly on toxic Twitter input, but were reset. Scaled-up, irreversible junk-training could cripple enterprise AI.

In essence, this isn't just a stunt - it's a warning. As we flood the internet with AI-generated slop (projected 90% of online content by 2026 per Gartner), future models risk inheriting this stupidity. The path forward? Curate data ruthlessly, prioritize quality in pre-training, and design recovery mechanisms. Otherwise, we won't get Skynet; we'll get an AI that thinks rain makes mushrooms magical but can't spell "apocalypse."


0 comments
Read more