Anthropic has introduced an AI classifier designed to detect and block dangerous queries related to technologies involving biological, chemical, and nuclear weapons. Preliminary tests indicate the system boasts a 96% accuracy rate.
The classifier aims to filter out information about weapons of mass destruction during the pre-training phase of AI models.
This approach seeks to prevent chatbots from providing instructions for creating such weapons, while preserving their ability to handle safe tasks.
Anthropic reiterated that safety must remain a core principle in AI development, emphasizing their commitment to responsible innovation.
Also read:
- This Music Will Last Forever: Yahoo Launches Its Own Sports Streaming Service
- Paramount’s New Hunt: A Major Deal with Legendary in the Works
- Rolling Stone’s Top 25 Influential Creators of 2025: A Controversial Ranking Sparks Debate
Thank you!

