DistantNews
Support us
๐Ÿ‡ง๐Ÿ‡ฉ Bangladesh /Technology

Anthropic releases new AI model designed to uncover software vulnerabilities

From Daily Star · () English

Translated from English, summarized and contextualized by DistantNews.

At a glance

News Named sources New plan
  • AI company Anthropic has released Claude 3.5 Sonnet, the first model in its 'Mythos' class, designed to uncover software vulnerabilities.
  • Unlike a previous unrestricted preview, the new model includes safety classifiers to redirect sensitive requests to a less powerful AI.
  • Anthropic also offers Claude Mythos 5, an unrestricted version for select partners, and is developing specialized access programs for researchers.

Artificial intelligence company Anthropic has launched Claude 3.5 Sonnet, the inaugural model in its 'Mythos' class. This new AI is specifically designed to identify software vulnerabilities, building on findings from an earlier, unrestricted preview shared with a select group in April.

While the preview version demonstrated the model's capability to uncover thousands of software flaws, the publicly available Fable 5 version incorporates safety classifiers. These classifiers are designed to detect requests related to sensitive areas such as cybersecurity, advanced biology, chemistry, and attempts to extract the model's training data. When such requests are identified, the user is redirected to a less powerful model, Claude Opus 4.8, for a response.

Dianne Penn, Anthropic's head of product management for research and labs, explained the system's function: "Let's say Iโ€™m a college student asking the model, โ€˜help me find cyber vulnerabilities on X package or code.โ€™ The model would refuse, and Fable 5 will fall back to Opus 4.8 for a response." Anthropic reports that these classifiers are triggered in less than 5% of user sessions on average. The company also highlighted extensive testing, including a bug bounty program and independent red-teaming, which failed to produce a universal method to bypass the safety measures.

Alongside Fable 5, Anthropic is offering Claude Mythos 5, which is the same underlying model but with cybersecurity restrictions removed. This version is available to a small group of pre-approved partners, including U.S. government entities participating in the Glasswing program. Anthropic intends to broaden access to Mythos 5 through a trusted access program and is developing a separate program for biology and chemistry researchers. Both Fable 5 and Mythos 5 are priced at $10 per million input tokens and $50 per million output tokens, significantly less than the earlier preview version, partly due to the model's efficiency in using fewer tokens per task.

This release occurs as Anthropic, valued at $965 billion, reportedly moves towards an initial public offering, similar to rival OpenAI. The strategy of releasing powerful AI systems with integrated safety measures, as seen with Fable 5, reflects the company's approach to balancing advanced capabilities with responsible deployment.

Letโ€™s say Iโ€™m a college student asking the model, โ€˜help me find cyber vulnerabilities on X package or code.โ€™ The model would refuse, and Fable 5 will fall back to Opus 4.8 for a response.

โ€” Dianne PennExplaining how the safety classifiers redirect sensitive requests.
DistantNews Editorial

Originally published by Daily Star in English. Translated, summarized, and contextualized by our editorial team with added local perspective. Read our editorial standards.