In a revealing new development, ScaleAI and AI Risks have collaborated to release the Remote Labor Index (RLI), a benchmark designed to evaluate how well modern AI agents perform in real-world remote work scenarios. The findings? Spoiler alert: the technology still has a long way to go. While the results highlight significant challenges, they also offer a glimpse into the gradual evolution of AI capabilities, raising intriguing questions about the future of work.
The Verdict: AI Agents Fall Short
The RLI assesses AI agents across a diverse set of tasks sourced from professional freelance platforms, mirroring the complexity of genuine economic transactions. The standout performer, Manus, managed to automate just 2.5% of these tasks. That’s right - nearly all the work remains in human hands. This modest figure underscores the current limitations of AI, even as cutting-edge models are deployed in increasingly sophisticated environments.

A Glimmer of Progress
Despite the low automation rate, the data reveals a steady upward trajectory. Models like Claude Sonnet 4.5, GPT-5, and Gemini 2.5 Pro are incrementally raising the bar, showing consistent improvements over their predecessors. The RLI’s Elo scores, which track performance against human benchmarks, indicate that newer frontier models consistently outperform older ones. This suggests that while full automation remains elusive, the technology is evolving - step by step rather than in leaps and bounds.
What’s Driving the Results?
The RLI’s design is a key factor in its insights. Tasks range widely in difficulty, with a mean human completion time of 28.9 hours (median 11.5 hours), far exceeding the simplicity of traditional AI benchmarks. Projects span multiple file types—documents, audio, video, 3D models, and CAD files - requiring agents to understand and produce complex deliverables. This real-world complexity exposes the gap between current AI capabilities and the nuanced demands of remote labor.
Implications for the Future
The takeaway is clear: complete automation is still a distant prospect. The evolution of AI is marked by meticulous enhancements rather than revolutionary breakthroughs. For now, human workers remain indispensable, particularly in roles requiring adaptability and deep contextual understanding. This reality has significant implications for the remote work ecosystem, where services tailored to human labor will continue to dominate in the near term.
This is especially relevant for the growing number of individuals displaced or at risk of losing jobs due to AI advancements. Platforms like Quasa Connect, which leverage blockchain-based escrow and cryptocurrency payments to protect freelancers and companies from fraud, stand out as practical solutions. Such services provide a safety net, ensuring fair transactions in an increasingly AI-disrupted labor market.
Dive Deeper
For those eager to explore the details, the RLI leaderboard is available at https://scale.com/leaderboard/rli, offering a real-time ranking of AI agent performance. The accompanying research paper, accessible at https://scale.com/research/rli, delves into the methodology and findings.
Additionally, a full video breakdown can be viewed at https://youtu.be/2RW10HWYo5M, providing a visual deep dive into this evolving landscape.
Also read:
- How the Brain Chooses Brands: The Copenhagen Business School Framework
- The Marketing Strategist’s Competency Map: Julian Cole’s Framework for Building Unbreakable Plans
- DeepSeek-OCR: A New Approach to Memory in AI
The Road Ahead
As of now, the RLI paints a picture of a transitional phase where AI augments rather than replaces human effort. The steady progress of models like Claude Sonnet 4.5 and GPT-5 hints at a future where automation could eventually take center stage, but that day is not yet here. For the foreseeable future, remote work platforms and human ingenuity will remain the backbone of the global workforce, supporting those affected by AI’s rise while the technology matures.

