Anthropic releases new AI model with safety features to prevent misuse
Translated from Korean, summarized and contextualized by DistantNews.
At a glance
- AI company Anthropic has released "Claude 3.5 Sonnet," a new AI model with enhanced safety features designed to prevent misuse for harmful purposes like creating biological weapons or hacking.
- The model will divert sensitive queries to a less capable, "safer" version, "Claude 3.4 Opus," to avoid generating dangerous content.
- Anthropic is implementing these safety measures ahead of its potential IPO amid intense competition with rivals like OpenAI.
AI company Anthropic has unveiled its latest AI model, "Claude 3.5 Sonnet," incorporating significant safety enhancements to prevent misuse. This new model is designed to handle sensitive topics, such as biological weapon creation or hacking techniques, by employing a "safety system." When users pose questions related to these high-risk areas, the model is programmed to redirect the query to a lower-capability version, "Claude 3.4 Opus," which has restricted response abilities in such domains.
Anthropic stated that extensive security testing was conducted to anticipate and mitigate potential "jailbreaking" attempts, where users try to bypass safety protocols. The company is adopting this cautious deployment strategy for its most advanced model, "Mythos," as it prepares for a potential initial public offering (IPO) and faces fierce competition from rivals like OpenAI. The goal is to maintain cutting-edge performance while rigorously controlling potential risks.
The "Claude 3.5 Sonnet" model boasts improved reasoning and memory capabilities, allowing it to perform complex tasks with fewer instructions. However, its usage cost is double that of "Claude 3.4 Opus" per token. Despite the higher price, Anthropic suggests that for certain tasks, the improved efficiency might lead to cost savings. Previously, Anthropic's "Mythos" model raised concerns due to its ability to quickly identify software vulnerabilities, even in highly secure systems like OpenBSD, highlighting the need for robust safety measures.
When users ask questions about sensitive topics like biological weapon manufacturing, exploiting software vulnerabilities, or hacking, the model will not process them and will instead switch to the lower-tier model, 'Claude Opus 4.8,' to provide an answer. Opus 4.8 is a model with limited response capabilities in high-risk areas.
Originally published by Hankyoreh in Korean. Translated, summarized, and contextualized by our editorial team with added local perspective. Read our editorial standards.