A virtually unknown team operating under the plain-vanilla name OpenAGI dropped a bomb this week: their model Lux has taken the crown on Online-Mind2Web, the toughest public benchmark for fully autonomous computer-using agents, and it didn’t just win; it demolished the leaderboard.
Where Claude 3.5 Sonnet + Computer Use scored 41.2 %, GPT-4o with tools hit 38.7 %, and even the previous open-source champion barely cracked 34 %, Lux posted 64.8 % on the same test, a gap so large it looks like a typo.
The reason is brutally simple and, in hindsight, obvious: almost everyone else is still bolting “computer-use” capabilities onto language models that were trained for one thing only: predicting the next word. OpenAGI threw that architecture away.
Lux was never asked to write beautiful prose. From day one it was trained end-to-end to output actions: mouse movements, clicks, keystrokes, and application commands. The training diet consisted of millions of real desktop screenshots paired with the exact sequence of low-level actions that followed.
No intermediate text, no chain-of-thought, no “let me plan in natural language first.” Just perception → action, exactly like a human who has used computers for years learns to do without narrating every click to themselves.
The payoff is an agent that isn’t limited to the browser sandbox most competitors live in. Lux can take over your entire machine: open Slack and send a message, switch to Excel and build a pivot table, jump to Photoshop and crop an image, then go back to Chrome and upload the result; all in one unbroken flow, using the same native APIs and accessibility layers you do.
No special plugins, no remote browser instance, no “sorry, I can only control Chrome” excuses.
The model ships in three flavors that read like a manifesto:
- Tasker: fire-and-forget mode. You type “book me the cheapest flight from Berlin to Lisbon next Friday, under 200 €, window seat” and walk away.
- Actor: streaming low-latency actions for real-time co-piloting (think pair-programming with an AI that actually moves the cursor).
- Thinker: slower, multi-step planning mode for the gnarliest enterprise workflows.
The website is full of the usual revolutionary rhetoric (“first true general computer-use foundation model,” “paradigm shift,” “end of prompt engineering,” etc.), but this time the benchmark score does the shouting for them. When an essentially anonymous team leapfrogs every billion-dollar lab by 50–80 % on the hardest public gauge, the hype writes itself.
OpenAGI is wasting no time. They immediately open-sourced the inference code, dropped developer-friendly SDKs in Python and TypeScript, and shipped a ready-to-embed UX component kit so any app can add a “Lux inside” bar in days instead of months. The message is clear: don’t wait for the big labs to catch up; ship with Lux today.
Of course, one benchmark does not make a revolution. Online-Mind2Web is tough, but it is still one benchmark. Real-world enterprise messiness; legacy Windows apps, VPNs, Citrix sessions, two-factor pop-ups, and capricious intranet portals; remains the ultimate judge. Independent reproductions and large-scale third-party evaluations are still pending.
Yet the signal is unmistakable. For two years the entire industry has been grafting shaky tool-calling layers onto language models and calling the result “agents.” Lux just showed that if you optimize for actions instead of words from the very first token, the performance gap isn’t incremental; it’s an order of magnitude.
The big labs will, of course, pivot. They have the compute and the data to train their own action-native models. But for the next six to twelve months at least, a tiny startup that barely anyone had heard of last week now owns the state-of-the-art in general computer control.
Also read:
- Real-World Token Utility: How Quasa Connects Projects, Users, and QUA
- The Essence and Mechanics of QUASA Rewards
- Dwarf Statues, Market Square Magic and Craft Beer – Wroclaw Is Europe's Most Underrated Stag Do Secret for 2026
Sometimes the future doesn’t come from the places that spent the most money screaming about it. Sometimes it arrives quietly, with a bland corporate name and a score on a leaderboard that makes the giants choke on their own marketing slides.
Welcome to the post-LLM agent era. It just started, and it already has a new king.
Author: Slava Vasipenok
Founder and CEO of QUASA (quasa.io) — the world's first remote work platform with payments in cryptocurrency.
Innovative entrepreneur with over 20 years of experience in IT, fintech, and blockchain. Specializes in decentralized solutions for freelancing, helping to overcome the barriers of traditional finance, especially in developing regions.

