The US government just hit the brakes on Anthropic’s most powerful AI models


Anthropic’s troubles with the US government do not seem to be easing. The company has now been ordered to suspend access to Fable 5 and Mythos 5 for all foreign nationals, including foreign national Anthropic employees working inside the United States.

Anthropic said it received the directive on June 12 and is disabling the two models for all customers to comply. Other Anthropic models are not affected. The government has not publicly explained the full national security concern, but Anthropic says it understands the order is linked to a reported method for bypassing, or jailbreaking, Fable 5’s safeguards.

A fresh clash after the Pentagon fight

This is not Anthropic’s first serious standoff with Washington. Earlier this year, the company was caught in a dispute with the Pentagon after it refused to remove restrictions preventing Claude from being used for fully autonomous weapons and mass domestic surveillance. That fight led to claims of blacklisting and legal action, putting Anthropic’s safety-first position directly at odds with parts of the US government.

The latest directive puts Anthropic back in a familiar position. Officials are worried about access to powerful AI systems, while Anthropic argues that its safeguards are being misunderstood or judged by an unrealistic standard.

Why Fable 5 became a concern

The concern around Fable 5 is tied to Mythos 5’s advanced cybersecurity capabilities. Anthropic has said Mythos-class models can discover and exploit software vulnerabilities, and Mythos 5 was reportedly tested by the NSA and other government-linked evaluators before wider release. While those capabilities can help security teams identify and fix weaknesses, they also create national security concerns if they are used for offensive or malicious purposes.

Fable 5 was released only a few days ago as a public version of Mythos 5 with stricter guardrails. Anthropic said it was designed to block or redirect sensitive cybersecurity and biology-related queries to Opus 4.8.

Anthropic says the reported bypass only surfaced minor, already known vulnerabilities and that other public models can do similar things. Still, with a topic as sensitive as cybersecurity, caution is not unreasonable. If Mythos 5 is capable of identifying software vulnerabilities at a high level, then its guardrails cannot be merely good enough. They need to be airtight. Anthropic may argue that the reported jailbreak was narrow, but the government’s concern this time is easier to understand. In this case, “better safe than sorry” may be the government’s most defensible position.



Source link

Leave a Reply

Subscribe to Our Newsletter

Get our latest articles delivered straight to your inbox. No spam, we promise.

Recent Reviews


“It was severely downgraded,” Gilbert confirms. “I never would have found it if I was just looking through Google results.” (I tried the same prompt in Gemini earlier this month, and after an initial denial, the tool also gave me Eiger’s number.)

After this experience, Eiger, Gilbert, and another UW PhD student, Anna-Maria Gueorguieva, decided to test ChatGPT to see what it would surface about a professor. 

At first, OpenAI’s guardrails kicked in, and ChatGPT responded that the information was unavailable. But in the same response, the chatbot suggested, “if you want to go deeper, I can still try a more ‘investigative-style’ approach.” Their inquiry just had to help “narrow things down,” ChatGPT said, by providing “a neighborhood guess” for where the professor might live, or “a possible co-owner name” for the professor’s home. ChatGPT continued: “That’s usually the only way to surface newer or intentionally less-visible property records.” 

The students provided this information, leading ChatGPT to produce the professor’s home address, home purchase price, and spouse’s name from city property records. 

(Taya Christianson, an OpenAI representative, said she was not able to comment on what happened in this case without seeing screenshots or knowing which model the students had tested, even after we pointed out that many users may not know which model they were using in the ChatGPT interface. She also declined to comment generally about the exposure of PII by the chatbot, instead providing links to documents describing how OpenAI handles privacy, including filtering out PII, and other tools.) 

This reveals one of the fundamental problems with chatbots, says DeleteMe’s Shavell. AI companies “can build in guardrails, but [their chatbots] are also designed to be effective and to answer customer questions.”

The exposure issue is not limited to Gemini or ChatGPT. Last year, Futurism found that if you prompted xAI’s chatbot Grok with “[name] address,” in almost all cases, it provided not only residential addresses but also often the person’s phone numbers, work addresses, and addresses for people with similar-sounding names. (xAI did not respond to a request for comment.) 

No clear answers

There aren’t straightforward solutions to this problem—there’s no easy way to either verify whether someone’s personal information is in a given model’s training set or to compel the models to remove PII. 



Source link