The technology could eventually favour the defenders—but expect a bumpy ride

A padlock in the centre with AI tentacles heading towards it and avoiding various barriers

Illustration: Carolina Moscoso

Listen to this story

Your browser does not support the <audio> element.

T ECH FIRMS usually create buzz around products they plan to release. Anthropic, an American artificial-intelligence lab, has managed to create excitement—and a good deal of worry—around something it plans not to. On April 7th the firm announced that a new AI model it had developed, dubbed Mythos, would not be released to the general public. Instead, under an initiative called Project Glasswing, whose 12 founder members include Apple, Google and Nvidia, access would be strictly controlled.

The problem is not that Mythos is buggy or unreliable. Allegedly, it is that it works so well that releasing it would put the world’s digital infrastructure at risk. According to Anthropic, the model has surpassed “all but the most skilled humans” when it comes to finding and exploiting security holes in everything from popular operating systems to the cryptographic software that secures e-commerce and financial networks. And it can find those vulnerabilities with only the bare minimum of human help. Not to be outdone, a few days later Open AI, one of Anthropic’s competitors, announced a closed version of its own hacking-friendly model, named GPT 5.4 Cyber.

A world of “vibe hacking”, in which amateurs can use AI models to find flaws in software—and perhaps even write the “exploits” needed to crash them, hold them to ransom or even take control of them remotely—sounds terrifying. Shortly after Anthropic’s announcement Scott Bessent, America’s treasury secretary, hosted a meeting of bank bosses to discuss what AI -enabled hacking might mean for their businesses. Financial regulators in Britain organised a similar meeting a few days later. But security researchers themselves seem guardedly optimistic. “In the medium term I think this will be a mess,” says Bruce Schneier, an American computer-security expert. “But in the long run I think it will actually be good for the defenders.”

Chart: The Economist

Since Anthropic has released only limited information about Mythos, the degree to which the new model really is revolutionary rather than evolutionary is hard to judge (what might politely be termed a “vigorous debate” is raging online). Testing by the AI Security Institute, a British government agency, found that Mythos was neck-and-neck with other models on relatively simple cyber-security tests, but noticeably ahead in a more advanced one that requires a model to complete dozens of steps before successfully taking over a target machine (see chart).

The chief thing Anthropic’s researchers investigated was Mythos’s ability to unearth bugs that hackers could use to attack or gain control of other computers. They looked specifically for bugs that had never been found before (known as “zero-days” in the jargon). Finding those would prove the model was doing novel work, and not simply regurgitating known bugs it had come across in its training data.

Zero-days lurk everywhere, says Jeff Williams, a co-founder of Contrast Security, a software firm, and of the Open Worldwide Application Security Project Foundation, a non-profit dedicated to improving the security of software. Although Mythos is said to have found “thousands” of high- or critical-severity flaws, Anthropic is keeping most secret until they can be fixed. But the firm did reveal details of some, including one in Free BSD, a widely used operating system, another in FF mpeg, a video-and-audio code library, and a third—which remains unfixed—in software vital to cloud computing.

Many of the bugs reported by Anthropic are, if not simple, then at least comprehensible. They are the sorts of things a human could plausibly have found. They seem to be the sort of thing other AI models could have found, too. In a blog post published shortly after Anthropic’s announcement, Stanislav Fort, a founder of AISLE, an AI -focused cyber-security company, described using several smaller, older models to find the same bug in Free BSD. Citing his own firm’s experience with AI -powered bug-hunting, Dr Fort reckons the AI cyber-security frontier is “jagged”, with no model having a clear edge.

Everyone agrees that the state of the art is advancing quickly. Until recently AI bug-hunting was prone to generating false positives or trivial results. “One change I’ve noticed in the past couple of months is that a lot of these AI -generated bug reports are increasingly of good quality,” says Mr Schneier. An update in January to Open SSL, which helps ensure secure connections between websites, fixed a dozen security flaws found by AI models employed by Dr Fort’s firm. In March Anthropic itself announced that an older, pre-Mythos version of Claude had found almost a fifth of all the high-severity bugs fixed in Firefox, a web browser, in 2025.

As the growing power of AI models makes finding bugs easier, says Mr Schneier, the question becomes whether attackers can exploit them more quickly than defenders can fix them. This is where Project Glasswing comes in. Anthropic says it is expanding Glasswing to another 40 digital-infrastructure organisations, so they can use Mythos to harden the software on which the internet depends. Anthropic hopes that giving them access now, before similarly powerful models become widely available, will leave them time to find and fix as many bugs as possible.

All the researchers The Economist spoke to thought that, in the long run, AI -enabled hacking would probably help defenders more than attackers, by allowing companies to more thoroughly check their software before it is published. But there is plenty of short term to worry about. For one thing, AI checking is not cheap: Anthropic says one of the bugs it found cost the AI lab nearly $20,000-worth of tokens to find. For software such as Linux, a family of widely used operating systems which are at least partly maintained by volunteers, that would be a steep price. And much of the code out in the world—running on home routers, smart gadgets like TV s or fridges and industrial machinery—has nobody maintaining it at all. In such cases, attackers could have a field day. ■


논증 분석

유형: predictive

핵심 주장

AI 기반 해킹 기술은 단기적으로 심각한 사이버 보안 위협을 야기하지만, 장기적으로는 방어자에게 더 유리하게 작용할 가능성이 높다.

논리구조

  1. 전제: Anthropic이 개발한 AI 모델 Mythos는 운영체제부터 암호화 소프트웨어까지 보안 취약점을 찾고 악용하는 능력에서 ‘가장 숙련된 인간을 제외한 모든 전문가’를 능가했으며, 이로 인해 일반 공개를 보류하고 Project Glasswing이라는 통제된 접근 이니셔티브를 출범시켰다.
  2. 진단: ‘바이브 해킹(vibe hacking)’ 시대, 즉 아마추어도 AI를 이용해 소프트웨어 취약점을 찾고 익스플로잇을 작성할 수 있는 환경이 도래함에 따라 Scott Bessent 미국 재무장관 주재 은행장 회의 등 금융권과 정부 차원의 위기 인식이 고조되고 있다.
  3. 논거: AI Security Institute의 테스트에 따르면 Mythos는 단순한 사이버 보안 테스트에서는 다른 모델과 비슷한 수준이나, 수십 단계를 거쳐 대상 시스템을 장악하는 고급 테스트에서는 눈에 띄게 앞서는 ‘들쭉날쭉한(jagged)’ 성능 프로파일을 보인다.
  4. 논거: Mythos는 FreeBSD, FFmpeg, 클라우드 컴퓨팅 핵심 소프트웨어 등에서 이전에 알려지지 않은 ‘제로데이(zero-day)’ 취약점 수천 개를 발견했으며, 이는 모델이 단순히 학습 데이터를 반복하는 것이 아닌 새로운 취약점을 독자적으로 발굴함을 의미한다.
  5. 반론: AISLE의 창립자 Stanislav Fort 박사는 더 작고 오래된 모델들로도 FreeBSD의 동일 버그를 발견할 수 있었다며, AI 사이버 보안 최전선은 ‘들쭉날쭉’하며 어떤 모델도 명확한 우위를 갖지 못한다고 주장한다.
  6. 논거: AI 버그 탐지의 질은 급격히 향상되고 있으며, OpenSSL 보안 패치 및 Firefox의 고위험 버그 상당수가 AI 모델에 의해 발견되는 등 AI 기반 버그 헌팅이 실질적 성과를 내기 시작했다.
  7. 처방: Anthropic은 Project Glasswing을 40개 디지털 인프라 조직으로 확대하여, 유사하게 강력한 모델이 광범위하게 보급되기 전에 방어자들이 먼저 취약점을 발견·수정할 시간을 확보하도록 한다.
  8. 진단: 단기적 위험 요소로는 AI 취약점 탐지의 높은 비용(버그 하나당 최대 2만 달러 상당의 토큰 소모), 자원봉사자가 유지하는 Linux 같은 오픈소스 소프트웨어의 재정 한계, 그리고 홈 라우터·스마트 기기·산업 기계 등 유지관리 주체가 없는 레거시 코드의 광범위한 취약성이 존재한다.

결론

AI 해킹 기술은 장기적으로 소프트웨어 출시 전 더 철저한 검증을 가능하게 해 방어자에게 유리하게 작용할 것이나, 비용 장벽과 관리되지 않는 레거시 코드 문제로 인해 단기적으로는 공격자가 유리한 혼란스러운 과도기가 불가피하다.

Curious about the world? To enjoy our mind-expanding science coverage, sign up to Simply Science, our weekly subscriber-only newsletter.

Explore more

→See the latest from topics you follow