09.09.2025 15:21

Anthropic Develops AI Classifier to Block Weapons of Mass Destruction Requests

News image

Anthropic has introduced an AI classifier designed to detect and block dangerous queries related to technologies involving biological, chemical, and nuclear weapons. Preliminary tests indicate the system boasts a 96% accuracy rate.

The classifier aims to filter out information about weapons of mass destruction during the pre-training phase of AI models.

This approach seeks to prevent chatbots from providing instructions for creating such weapons, while preserving their ability to handle safe tasks.

Anthropic reiterated that safety must remain a core principle in AI development, emphasizing their commitment to responsible innovation.

Also read:

Thank you!


0 comments
Read more