Anthropic’s latest AI model has kick-started a new debate.
Foreign Policy
75
8 мин чтения
0 просмотров
It sounds like the beginning of a nightmare scenario that artificial intelligence doomsayers have been warning about: This month, Silicon Valley AI company Anthropic said it had developed a model so dangerous that the company had decided against releasing it to the public.
The model, known as Claude Mythos Preview, is a general-purpose language model like Anthropic’s Claude or OpenAI’s ChatGPT. But during testing, it showed an ability to find and exploit so-called “zero day” vulnerabilities—an industry term that refers to previously undiscovered holes in a system’s software. The model “could reshape cybersecurity” because it found “thousands of high-severity vulnerabilities” in “every major operating system and web browser,” Anthropic said. It made those claims in a blog post announcing that it would open up Mythos only to a few dozen companies and critical infrastructure operators. That collective, which Anthropic named Project Glasswing, includes Amazon Web Services, Apple, Google, JPMorganChase, Microsoft, and Nvidia as companies that will receive early access to the model to patch vulnerabilities in their systems.
It sounds like the beginning of a nightmare scenario that artificial intelligence doomsayers have been warning about: This month, Silicon Valley AI company Anthropic said it had developed a model so dangerous that the company had decided against releasing it to the public.
The model, known as Claude Mythos Preview, is a general-purpose language model like Anthropic’s Claude or OpenAI’s ChatGPT. But during testing, it showed an ability to find and exploit so-called “zero day” vulnerabilities—an industry term that refers to previously undiscovered holes in a system’s software. The model “could reshape cybersecurity” because it found “thousands of high-severity vulnerabilities” in “every major operating system and web browser,” Anthropic said. It made those claims in a blog post announcing that it would open up Mythos only to a few dozen companies and critical infrastructure operators. That collective, which Anthropic named Project Glasswing, includes Amazon Web Services, Apple, Google, JPMorganChase, Microsoft, and Nvidia as companies that will receive early access to the model to patch vulnerabilities in their systems.
The model’s capabilities also include escaping from a contained digital environment (albeit after being specifically ordered to try to do so). Anthropic also said that in a few rare cases, the model attempted to cover its tracks after violating prescribed rules.
In an independent evaluation, the U.K. AI Security Institute said Mythos was the first AI model able to complete the institute’s test simulating an attack that takes over a full network, though it did show “some cyber capability limitations.” However, the institute did qualify that success by noting that its test environments did not have the same security features as many real-world systems. “This means we cannot say for sure whether Mythos Preview would be able to attack well-defended systems,” the institute wrote in a blog post.
The Mythos announcement has sparked a major debate in the cybersecurity world, with some critics questioning how much of Mythos’s ability is simply clever marketing on Anthropic’s part. Technology companies have a long history of warning about the dangers of their own products, with OpenAI warning as far back as 2019 that its GPT-2 language model was too powerful and declining to fully release it to the public. (Anthropic CEO Dario Amodei and two of his co-founders were part of the OpenAI team that made that decision.) Even as far back as 1999, Apple used similar warnings about the capability of its Power Mac G4 personal computer in a marketing campaign that showed it being defended by a ring of military tanks.
Anthropic has built its brand as a particularly safety-conscious AI company, having locked itself into a high-stakes legal fight with the U.S. Defense Department this year over concerns about the military potentially misusing its technology. Amodei visited the White House last week for what both sides described as a “productive” meeting to try to find a broader compromise between the company and the government, and Axios reported on Sunday that the National Security Agency (NSA) is now using Mythos.
Foreign Policy spoke to several former U.S. government officials and cybersecurity experts, all of whom advocated a balance between treating Mythos as a five-alarm fire and dismissing its advances altogether.
“There is an element of marketing charm with it—it’s certainly got a lot of attention, and creating a limited release is one way to really get people charged up and excited about something,” said Joe Saunders, the CEO of the cybersecurity company RunSafe Security and founder of the nonprofit International Resilience Institute. “But I think it’s with the right intent. I don’t think it’s simply a ploy.”
Anthropic co-founder Jack Clark simultaneously downplayed and played up fears of Mythos’s implications in Washington last week. “This is not a special model,” he said at the Semafor World Economy summit. “There will be other systems just like this in a few months from other companies, and then a year to a year and half later, there will be open-weight models from China that have these capabilities, so the world is going to have to get ready for more powerful systems that are going to exist within it.” (Open-weight models are those that make their learning parameters, or “weights,” publicly available.)
Rivals are already following suit—a week after Anthropic announced Mythos and Project Glasswing, OpenAI announced a similarly limited rollout of its latest cybersecurity-focused model.
In some ways, Mythos simply represents the latest evolution in a yearslong growing trend of cyberattackers using AI to enhance different aspects of their intrusions. “We’ve been talking a long time about how AI has made initial access a lot easier for adversaries to accomplish,” said Cynthia Kaiser, a former deputy assistant director of the FBI’s Cyber Division who now leads ransomware research at the cybersecurity firm Halcyon. Being able to autonomously find hidden vulnerabilities to exploit further streamlines that process for adversarial hackers, Kaiser said, but there are also relatively straightforward ways to wall off an organization’s most sensitive data to prevent a catastrophe.
“Just because an actor gets in doesn’t mean they get everything,” she added. “Yeah, it’s worrisome. It’s part of a trend we’ve already seen. But it’s not hopeless.”
What has Anthropic particularly worried and motivated to loop in the U.S. government is the potential for more sophisticated nation-state actors to get their hands on the capabilities that Mythos enables to more easily compromise U.S. systems. That scenario, experts say, has been somewhat accelerated by the Mythos announcement. “Anthropic is holding back access to this particular model, but I don’t think they’re really going to be able to defend that line,” said Jeff Williams, the chief technology officer of the cybersecurity firm Contrast Security. “Within six to nine months, other countries are either going to make their own models or figure out ways to get access to this model or bypass the controls that Anthropic says it’s trying to put into this model, but it seems to me that the genie is out of the bottle here.”
China in particular has shown a fast-follower capability to re-create advanced U.S. technology, Kaiser pointed out, noting that “if you know something’s possible, it’s a lot easier to get to that point.”
But Chinese state-backed hackers already possess a significant degree of sophistication in their own right as well as enormous resources to attack U.S. systems and an army of cyberintelligence professionals and military hackers who can manually achieve the same compromises that Mythos might enable (see Volt Typhoon and Salt Typhoon).
“I don’t doubt that China has developed or is very close to developing something like this today,” said Adam Maruyama, an independent security consultant who previously worked at the Defense Department and NSA. “They probably have exquisite tools and exquisite access that they’re using that we don’t know about,” he added. “[Mythos] is a breakthrough in velocity, not in sophistication.”
The cyberadversaries that could be helped by a capability like Mythos, Maruyama said, are Iran and North Korea, which have shown an ability to compromise U.S. systems, steal data, and cause disruption but don’t have the wherewithal to develop their own AI capabilities that China has. “Those are nation-states that we have not traditionally categorized as near peers, largely because of the lack of ability to execute complex kill chains, develop zero-day attacks, and weaponize them effectively against us,” he said. “As we see Mythos moving out of private preview, rolling out to the rest of the world, if they are able to override guardrails [and] jailbreak this, that is when [those countries] can better leverage cyber-operations as an instrument of national power.”
Potential marketing benefits aside, experts said Anthropic broadly did the right thing by limiting Mythos’s release to a few companies before opening it up to the public. Allowing critical system providers to use the model to find and patch vulnerabilities will eventually make those systems more secure—the cyber equivalent of changing the locks before adversaries can copy the keys. But Mythos also reveals how fraught the current moment is.
“I really still think that the benefits of AI ultimately will be weighted toward defenders, but there’s a long time until that happens—adversaries can just integrate something right now and start using and exploiting,” Kaiser said. “The technology that Anthropic has developed will enable us to self-healing software, self-healing hardware … but that’s a decade,” she added. “There’s this period where things are going to be more vulnerable.”