A fascinating comparison of AI models tackling Pokémon Red has revealed stark differences in their gaming prowess.
The results showcase not just their ability to play but also their skills in planning, optimization, and minimizing unnecessary actions.
Leading the pack is GPT-5, completing the game in an impressive 6,470 steps. In contrast, o3 took 18,184 steps, Claude managed 35,000, and Gemini 2.5 Pro lagged behind with 68,000 steps.
GPT-5’s performance is a standout, finishing over three times faster than o3 and nearly ten times more efficiently than Gemini 2.5 Pro.
This gap highlights more than just raw gaming skill — it underscores the models’ capabilities in strategic planning and efficient decision-making. GPT-5’s ability to optimize its path through the game world suggests advanced algorithms for anticipating challenges and avoiding redundant moves.
Also read:
- Project Odyssey Season 3: The World’s Largest AI Filmmaking Competition Opens Registration
- Watching TikToks and Reels Can Be Five Times More Harmful to the Brain Than Alcohol, Study Finds
- Monero Developers Urge Community to Launch Personal Mining Rigs to Boost Security
- Google’s AI Overviews Reshape User Behavior, Threatening Publishers’ Traffic
The comparison serves as a unique benchmark for AI development, revealing how well these models can adapt to complex environments beyond traditional tasks. While GPT-5 sets a high standard, the varying results from o3, Claude, and Gemini 2.5 Pro indicate room for improvement in their planning and execution. As AI continues to evolve, such experiments may offer valuable insights into their potential across diverse applications.

