25.03.2026 09:28Author: Viacheslav Vasipenok

The End of Hallucination? How Percepta Put a Virtual Machine Inside an LLM’s Weights

News image

The AI industry is witnessing a historical pivot. For years, we have accepted a fundamental flaw in Large Language Models (LLMs): they don't actually "calculate"—they predict. When you ask ChatGPT to solve a complex math problem, it isn't running an equation; it’s guessing the next most likely symbol.

That era might be coming to an end. A startup called Percepta has released a Proof of Concept (PoC) research paper that is currently the talk of the AI community. They have successfully embedded a WebAssembly (WASM) interpreter directly into the weight matrix of a Transformer.


The Fundamental Problem: Guessing vs. Computing

Currently, models like Claude or GPT-4 treat logic like a language. Because they lack a deterministic "reasoning engine," they often hallucinate in precise tasks. The industry "fix" has been to use external crutches — forcing the AI to write Python code and execute it in an external sandbox (Code Interpreter).

Percepta has proven this is unnecessary. They demonstrated that a Transformer is physically capable of executing complex machine code internally with 100% accuracy.


Pure Cyberpunk: 30,000 Tokens of Pure Logic

The demonstration feels like something out of a sci-fi novel. Instead of generating human-readable text, the model spits out machine code at speeds exceeding 30,000 tokens per second. The neural network isn't "writing"; it is juggling registers, memory addresses, and branches.

To test this, the researchers fed the model one of the world’s hardest Sudoku puzzles. Instead of guessing the solution, the Transformer executed a physical backtracking search algorithm:

  1. It inputs a digit.
  2. It detects a logical contradiction.
  3. It performs a "backtrack" to the previous state.
  4. It iterates until it finds the mathematically perfect answer.

No hallucinations. No "vibe-based" reasoning. Just pure, deterministic logic.


Breaking the "Attention Bottleneck"

The reason this hasn't been done before is the Attention Bottleneck. In a standard Transformer, the model must re-read its entire generation history at every single step. For a calculation requiring a million steps, the model would run out of memory and crash instantly.

Percepta solved this by inventing Exponentially Fast Attention. This mechanism allows the model to search through its past data in logarithmic time. This breakthrough allows the Transformer to run millions of computational steps in seconds without the typical performance "lag" or memory bloat.


The Blueprint for AGI?

Even Andrej Karpathy has chimed in to express his respect for the research. The implications go far beyond building a faster calculator.

In cognitive science, we distinguish between two types of thinking:

  • System 1: Intuitive, fast, and prone to error (Current LLMs).
  • System 2: Slow, deliberate, and logically rigid (Computer code).

Percepta has provided the blueprint for merging these two systems into a single "brain." If this error-free mathematical co-processor can be seamlessly integrated with linguistic models, we get an AI that:

  1. Never hallucinates in logic.
  2. Operates autonomously without needing external scripts or sandboxes.
  3. Runs heavy simulations entirely within its own internal weights.

This isn't just a minor upgrade for chatbots; it is a technically tangible bridge toward Artificial General Intelligence (AGI). We are no longer just teaching machines to talk; we are teaching them to think in the most literal, digital sense of the word.


Also read:

Thank you!


0 comments
Read more