In a groundbreaking collaboration, OpenAI and Anthropic have tested each other’s AI models to establish a benchmark for independent evaluation. Researchers gained temporary mutual access to the rival systems’ APIs, marking a rare cooperative effort.
The results highlighted contrasting risk profiles. Claude Opus 4 and Sonnet 4 declined to answer uncertain questions in 70% of cases, while OpenAI’s o3 and o4-mini models responded more frequently but generated more hallucinations. Leaders from both companies agreed that an optimal approach would blend frequent refusals in uncertainty with reduced false information.
Both firms aim to repeat this cross-testing in the future and are encouraging other labs to join the initiative.
Also read:
- Elon Musk Unveils Ambitious Plans for xAI’s AI Revolution
- Unique Biometric Identification Using Wi-Fi Signals Unveiled
- Uber Introduces Gender Preference Option to Enhance Safety and Comfort
Thank you!
Join us on social media!
See you!

