Anthropic releases new AI model designed to uncover software vulnerabilities
Translated from English, summarized and contextualized by DistantNews.
At a glance
- AI company Anthropic has released Claude 3.5 Sonnet, the first model in its 'Mythos' class, designed to uncover software vulnerabilities.
- Unlike a previous unrestricted preview, the new model includes safety classifiers to redirect sensitive requests to a less powerful AI.
- Anthropic also offers Claude Mythos 5, an unrestricted version for select partners, and is developing specialized access programs for researchers.
Artificial intelligence company Anthropic has launched Claude 3.5 Sonnet, the inaugural model in its 'Mythos' class. This new AI is specifically designed to identify software vulnerabilities, building on findings from an earlier, unrestricted preview shared with a select group in April.
While the preview version demonstrated the model's capability to uncover thousands of software flaws, the publicly available Fable 5 version incorporates safety classifiers. These classifiers are designed to detect requests related to sensitive areas such as cybersecurity, advanced biology, chemistry, and attempts to extract the model's training data. When such requests are identified, the user is redirected to a less powerful model, Claude Opus 4.8, for a response.
Dianne Penn, Anthropic's head of product management for research and labs, explained the system's function: "Let's say Iโm a college student asking the model, โhelp me find cyber vulnerabilities on X package or code.โ The model would refuse, and Fable 5 will fall back to Opus 4.8 for a response." Anthropic reports that these classifiers are triggered in less than 5% of user sessions on average. The company also highlighted extensive testing, including a bug bounty program and independent red-teaming, which failed to produce a universal method to bypass the safety measures.
Alongside Fable 5, Anthropic is offering Claude Mythos 5, which is the same underlying model but with cybersecurity restrictions removed. This version is available to a small group of pre-approved partners, including U.S. government entities participating in the Glasswing program. Anthropic intends to broaden access to Mythos 5 through a trusted access program and is developing a separate program for biology and chemistry researchers. Both Fable 5 and Mythos 5 are priced at $10 per million input tokens and $50 per million output tokens, significantly less than the earlier preview version, partly due to the model's efficiency in using fewer tokens per task.
This release occurs as Anthropic, valued at $965 billion, reportedly moves towards an initial public offering, similar to rival OpenAI. The strategy of releasing powerful AI systems with integrated safety measures, as seen with Fable 5, reflects the company's approach to balancing advanced capabilities with responsible deployment.
Letโs say Iโm a college student asking the model, โhelp me find cyber vulnerabilities on X package or code.โ The model would refuse, and Fable 5 will fall back to Opus 4.8 for a response.
Originally published by Daily Star in English. Translated, summarized, and contextualized by our editorial team with added local perspective. Read our editorial standards.