Anthropic warns AI agents could soon improve themselves without human intervention

Editorial illustration for: Anthropic warns AI agents could soon improve themselves without human intervention

In brief

  • Claude now authors roughly 80% of code merged into Anthropic's codebase
  • AI model improvements are doubling every four months, accelerating toward autonomous agents
  • Anthropic withheld Claude Mythos from release due to cybersecurity exploit risks
  • Favaro and Clark recommend slowing frontier AI development for safety alignment

The acceleration problem

AI model improvement has been roughly doubling every four months, rather than every seven months. That pace is compressing the window for safety research and oversight. Anthropic's agents can already run code themselves and delegate hours of work to other agents, demonstrating capabilities that were theoretical just months ago.

The company took a decisive step to manage the risks. Anthropic ruled out releasing its AI model Claude Mythos to the public over concerns about the threat to global cybersecurity, after finding that Claude Mythos was able to easily create software exploits. The decision underscores a hard reality: capability and risk are advancing in lockstep.

Industry-wide safety scramble

Anthropic isn't alone in flagging the danger. OpenAI is researching how to safely develop and deploy increasingly capable AI, including AI capable of recursive self-improvement. The company is hiring a researcher for recursive self-improvement preparedness as part of its Safety Research team, signaling how seriously the problem is being taken at the frontier labs.

Beyond the labs, a group of tech leaders released an open letter urging lawmakers to enact stronger guardrails around AI technology over concerns it could be used to overcome knowledge barriers that have historically prevented bad actors from creating biological weapons.

The case for a slowdown

Favaro and Clark's argument is straightforward: the pace of AI development is outrunning our ability to align it safely. They recommend slowing frontier AI development to allow societal structures and alignment research to keep up with technological advance. This isn't a call to halt progress entirely, but to create breathing room for governance and safety work to mature.

The stakes are already visible. Circle CEO Jeremy Allaire predicted in January that billions of AI agents would operate on users' behalf within five years. That timeline is no longer speculative — Keyrock reported that AI agents settling payments went from concept to reality in the past 12 months, with $73 million settled across 176 million transactions. Autonomous agents are already moving capital. The question is whether governance can keep pace.

Frequently asked questions

What does recursive self-improvement mean in AI?

Recursive self-improvement refers to an AI system's ability to autonomously design, train, and improve its own successor without human intervention. Anthropic warns this could happen if AI development continues at current acceleration rates, with improvements doubling every four months.

Why did Anthropic withhold Claude Mythos from public release?

Anthropic ruled out releasing Claude Mythos because the model could easily create software exploits, posing a significant threat to global cybersecurity. The decision reflects the company's assessment that the risks outweighed the benefits of public deployment.

What are Favaro and Clark recommending?

Marina Favaro and Jack Clark recommend a slowdown in frontier AI development to give alignment research and societal structures time to catch up with technological advances. They argue this breathing room is necessary before AI systems become capable of fully autonomous self-improvement.