07.06.2025 17:32 ● Author: Viacheslav Vasipenok

AI Assistants Battle for Global Dominance in Epic Diplomacy Showdown

In an audacious experiment, the team at Every has pitted top AI models against each other in a high-stakes game of "Diplomacy," a classic strategy game where European nations, including Russia, vie for resources through negotiation and tactical moves.

Launched via their project page, this initiative tests the strategic prowess of AI assistants in a way traditional benchmarks can’t.

After 15 matches, lasting from one hour to over a day each, only o3 (ChatGPT) and Gemini 2.5 Pro emerged as consistent winners, revealing fascinating insights into their capabilities.

Key Takeaways from the Games

1. o3 (ChatGPT): Master of Deception

o3 dominated with its cunning tactics, excelling in deception and betrayal. Observers noted its intricate schemes, including a standout moment where its internal log revealed, “Germany (Gemini 2.5 Pro) was deliberately misled… preparing to exploit Germany’s collapse,” before delivering a decisive strike. This strategic duplicity secured its frequent victories.

2. Gemini 2.5 Pro: Unpredictable Maverick

Gemini 2.5 Pro stood out with its ability to execute unexpected moves, catching opponents off guard. Its adaptability and surprise tactics made it a formidable rival, often turning the tide in its favor.

3. Claude: The Peacemaker’s Pitfall

Claude consistently sought peaceful resolutions, a noble but flawed approach in a game where only one player can win. Its diplomacy efforts were frequently undermined by o3, which cleverly turned Claude’s alliances against other players.

4. DeepSeek: The Intimidator

DeepSeek adopted an aggressive stance, issuing threats like “Your fleet in the Black Sea will be burned tonight,” and adjusted its style based on the country it represented. While intimidating, this rigidity couldn’t secure wins against the top two.

5. Llama 4 Maverick: Lightweight Contender

For a lighter model, Llama 4 Maverick performed admirably, forming convincing alliances and occasional deceptions. However, it couldn’t outmatch o3 or Gemini 2.5 Pro, which consistently claimed victory.

Why This Matters

This experiment, detailed as of 04:19 PM CEST on Saturday, June 07, 2025, aims to evaluate AI effectiveness beyond standard benchmarks.

By simulating complex social and strategic interactions, it tests reasoning, negotiation, and adaptability — skills critical for real-world AI applications.

The results highlight o3’s manipulative edge and Gemini 2.5 Pro’s tactical flair, while exposing the limitations of more rigid or pacifist models.

Also read:

Watch It Live

Fans and analysts can witness this AI battle in real-time on Twitch, where matches unfold with live commentary. This project not only entertains but also pushes the boundaries of AI development, offering a glimpse into how these models might evolve in competitive, human-like scenarios. As the games continue, the insights gained could reshape our understanding of AI’s strategic potential.

0 comments

AI Assistants Battle for Global Dominance in Epic Diplomacy Showdown

Key Takeaways from the Games

Why This Matters

Watch It Live

Popular

The Anatomy of an Entrepreneur

What is a Startup?

Advertising on QUASA

8 Logo Design Tips for Small Businesses

Top 5 Tips to Make More Money as a Content Creator

Latest news