Claude writes 80% of its code, calls for AI pause



TL;DR

Anthropic reveals that Claude now writes over 80% of its production code, with engineers shipping 8x more code per quarter than in 2024. The company’s new Anthropic Institute paper maps the path to recursive self-improvement and calls for a verifiable global pause mechanism.

One of Anthropic’s engineers hasn’t written a line of code in five months. Not because the work dried up, but because Claude does it now. As of May 2026, more than 80% of the code merged into Anthropic’s production codebase was authored by Claude, up from low single digits when Claude Code launched in February 2025.

That figure, published Wednesday in a new Anthropic Institute paper titled “When AI builds itself,” is not the headline the company wants you to focus on. The headline is what comes next: AI that can design and train its own successor. Anthropic says it isn’t there yet, but it might be closer than most institutions are prepared for.

The numbers behind the shift

The productivity gains are stark. In Q2 2026, the typical Anthropic engineer merged eight times as much code per day as in 2024. An internal poll of 130 research staff found that the median respondent estimated roughly four times as much output with Anthropic’s latest model, Mythos Preview, compared to working without AI.

On the most complex, open-ended engineering problems, Claude’s success rate climbed to 76% in May 2026, a 50-percentage-point increase in six months. Anthropic gives a concrete example: when a routine upgrade began crashing tens of thousands of training jobs, an engineer pointed Claude at the live incident with little more than some text context and cluster access. Claude isolated an obscure debugging flag, reproduced the crash, and confirmed a fix in about two hours. That would normally take two to three days.

The code quality gap is closing, too. Anthropic staff say that Claude-written code was “somewhat worse” than human-written code in late 2025, is at rough parity today, and is expected to be strictly better within the year. An automated Claude reviewer now checks every proposed change to Anthropic’s codebase before it can merge. A retrospective analysis found it would have caught roughly a third of the bugs behind past claude.ai incidents before they reached production.

From coding to research

Writing code is the easy part. The harder question is whether Claude can do research, the kind of open-ended scientific reasoning that drives AI forward.

Anthropic’s evidence here is more preliminary but still striking. In April 2026, the company published a demonstration of Claude running an open-ended AI safety research project end to end. Nine parallel agents were given a problem, left to propose hypotheses, run experiments, share findings through a common forum, and iterate. Over 800 cumulative hours and roughly $18,000 in compute, the agents recovered 97% of the performance gap on the task. Two human researchers, working for a week, recovered 23%.

Another internal experiment measured whether Claude could pick a better “next step” than a human researcher at difficult junctures during real research sessions. In November 2025, Claude matched the human’s judgment 51% of the time. By April 2026, that rose to 64%. The day-to-day work of research is largely a chain of these next-step decisions. If that trend continues, the gap between AI-as-assistant and AI-as-researcher narrows fast.

The task horizon curve

Anthropic’s internal data aligns with a broader pattern tracked by METR, a non-profit that benchmarks AI capabilities. The length of tasks AI can reliably complete on its own has been doubling roughly every four months, accelerating from an earlier pace of every seven months.

In March 2024, Claude Opus 3 could handle tasks that take a human about four minutes. By early 2025, Claude Sonnet 3.7 managed hour-and-a-half tasks. Today, Claude Opus 4.6 handles 12-hour tasks, and METR found that Mythos Preview could sustain work for at least 16 hours, at the upper end of what the current benchmark suite can measure. If the trend holds, tasks requiring days of skilled human work come into range this year. Weeks-long tasks could follow in 2027.

The infrastructure is buckling

The downstream effects are already visible. GitHub, the platform most of the world’s software is built on, saw roughly one billion code commits in all of 2025. By mid-2026, the platform was processing 275 million commits per week, on pace for 14 billion over the year. Claude Code alone accounts for 4.5% of all public commits on GitHub, generating 2.6 million weekly.

GitHub’s COO has said the company is “pushing incredibly hard” on capacity just to keep up. Inside Anthropic, the bottleneck has already shifted: as Claude generates more code, human code review has become the constraint. The company says it has encountered a textbook example of Amdahl’s law, where speeding up one part of a process simply reveals the next slowest link.

The pause question

The paper’s most significant section is not about productivity. It is a call for a verifiable global mechanism to slow or temporarily pause frontier AI development.

Anthropic is careful with the framing. A unilateral pause by one lab would simply change who leads, not create the deliberative process the company says is missing. What Anthropic proposes instead is a system where multiple frontier labs, in multiple countries, could agree to stop under the same conditions and verify that the others had actually done so. It draws a parallel to nuclear arms control but acknowledges the differences: training runs are far easier to conceal than missile silos, the inputs are general-purpose, and the incentive to defect quietly is enormous.

“If it were possible to effectively slow the development of this technology to give ourselves more time to deal with its immense implications, we think that would likely be a good thing,” the paper states. The AI coding market is now worth tens of billions. Asking the industry to pause is asking it to leave money on the table while trusting that competitors, including those in China, will do the same.

What recursive self-improvement would mean

The paper lays out three possible futures. In the first, the trend stalls, but even today’s capabilities reshape the economy. In the second, AI development becomes substantially automated while humans still set research direction, meaning 100-person companies could do the work of 100,000-person organisations. In the third, AI systems achieve full recursive self-improvement and begin designing their own successors.

Anthropic says it does not have “good intuitions” for what that third scenario looks like. But it offers one observation: even recursive intelligence cannot speed up everything. It cannot learn what a drug does over decades of use, hold elections sooner than a constitution dictates, or turn a stranger into an old friend in a weekend. The felt pace of this future, for most people, would still be set by the bottlenecks.

The company’s growing enterprise push makes the timing of this paper notable. Anthropic is simultaneously selling Claude as a productivity revolution and warning that the trajectory it enables could require a global emergency brake. Whether that tension is principled transparency or strategic positioning depends on what happens next.



Source link

Leave a Reply

Subscribe to Our Newsletter

Get our latest articles delivered straight to your inbox. No spam, we promise.

Recent Reviews


gettyimages-647882122

S847/iStock / Getty Images Plus

Follow ZDNET: Add us as a preferred source on Google.


ZDNET’s key takeaways

  • Staff who use AI can end up with more to do, not less.
  • Think carefully about the tools you’re using and why.
  • Adopt a set of standards and refine your outputs.

The promise of productivity boosts from AI can come with an unwelcome side order of stress. Harvard Business Review found that AI doesn’t reduce work; it intensifies it, leading to cognitive fatigue and unsustainable hours.

While the common perception is that AI can help reduce workloads, allowing employees to focus more on higher-value and more engaging tasks, HBR’s research found that staff using AI worked more quickly and often ended up with more to do, not less.

Also: Forget productivity: Here are 5 strategic shifts that drive real AI value

While we’ve written about how some professionals are finding ways to turn AI’s time-saving magic into a productivity superpower, we’ve also recognized that some employees have started to become tired with the low quality of AI outputs.

Ankur Anand, group CIO at tech recruiter Harvey Nash, said professionals who want to avoid cognitive fatigue must understand how to use AI effectively and its potential risks.

“That focus will help to reduce the noise around the workload that AI creates,” he told ZDNET, suggesting that many people have unrealistic expectations about the productivity boost that AI will provide.

Also: Why I ditched Copilot for Claude in Word, Excel, and PowerPoint – and how you can, too

“Many organizations are telling their people, ‘We want to understand how you’re making an impact with AI,'” he said. “But these professionals are not empowered, which means that using AI adds a lot of pressure, because they need to prove themselves on their own terms.”

If you’re going to make the most of AI at work, then you’re going to have to find an effective balance between completing tasks quickly and producing high-quality work. 

Here’s how the experts believe professionals can ensure they reap the benefits, not the problems, of AI — and they suggest that you’ll need to focus on three core areas: tools, guidelines, and outputs.

Limit your toolset

Alex Read, senior enterprise product manager for data at energy provider EDF UK, told ZDNET that the best way for professionals to reap the benefits, not the challenges, of AI is to be uber-focused on tools that help you produce value in your roles.

While there are thousands of potential AI-enabled services on the market, Read said sensible professionals limit their horizons.

Also: How this travel company’s AI rollout drove a 73% satisfaction boost: A 5-step playbook for your business

In his own role, for example, Read focuses on how AI can help him build a data platform and update information accurately, efficiently, and productively: “Anything outside of that scope is noise for me.”

That sentiment resonated with Nick Pearson, CIO at technology specialist Ricoh Europe, who told ZDNET it’s important to take a step back and think carefully about how an AI tool can help you produce value in your role.

“If you think about the phrase ‘gen AI,’ the tech is very good, by definition, at generating outputs,” he said. “I could go to bed in the evening, set the model to work, and we could have four new IT strategies produced overnight.”

Also: Worried AI agents will replace you? 5 ways you can turn anxiety into action at work

However, quantity doesn’t necessarily mean quality. Pearson suggested it’s important to focus on AI’s blind spots, particularly as most models are trained on preexisting content.

“AI can’t inspire people, per se; it can’t naturally create something new, because it’s actually quite recursive,” he said.

“And the judgment you have to put in sometimes, on top of everything else, whether it be an ethical or a capability judgment, is not there automatically in the technology.”

It’s in this gap, said Pearson, that human experts play a critical role: “We’re toying with that concern as an organization and saying, ‘Where does AI really play an important role, versus where are we upskilling people in areas that AI probably won’t play for a long time?'”

Work to the guidelines

HBR’s research found that an initial productivity surge when AI is adopted can lead to lower-quality work, turnover, and other problems as people work harder rather than smarter.

To correct this issue, HBR said companies need to adopt an “AI practice,” or a set of norms and standards around AI use that help professionals ensure they use AI in a constrained but productive manner.

Also: 90% of AI projects fail – here are 3 ways to ensure yours doesn’t

At EDF UK, Read is part of an internal AI Center of Excellence in enterprise IT, which enables policy for the effective use of AI across the wider organization. 

In addition to Read, who contributes input from a data-use perspective, the group includes other tech representatives, such as the firm’s senior manager of AI, principal software engineer, and principal solution architect.

“The remit of this center is to make sure that, when the federated business units are looking to build, develop, and deploy AI services, they have platforms, guidance, best practices, architectural assets, and materials to guide them on how to safely and efficiently adopt AI and operationalize it at scale,” he said.

Some of the key themes the center considers when assessing AI tools are scalability and reusability, ensuring a proposed service doesn’t replicate one already in use.

Also: 5 ways to use AI when your budget is tight

“All new tools and services related to AI will go through that hopper and funnel to understand scope and ensure the security, regulatory, and ethical side of things are understood,” he said, suggesting that all professionals should use their organization’s pre-existing guidelines to foster an appropriate exploitation of emerging tech.

“The benefit that guided approach brings is that it allows us to be clear in our messaging around what AI services can be used, how they’re used from a use-case perspective, and ultimately, what personas are allowed to use them.”

Refine your outputs

Even when tools are assessed and considered acceptable, there can still be an overreliance on AI outputs. Worse, some professionals can drown in the insights they receive, leading to higher stress and fewer benefits.

Louise Newbury-Smith, head of UK&I at technology specialist Zoom, told ZDNET that one way to ensure your outputs are constrained is to focus on prompting.

“Use simple amendments to be specific, such as ‘Give me the top three things with the biggest impact.’ That approach should guide your prompt, rather than saying, ‘Give me everything you know about this topic.'”

Also: 5 ways to fortify your network against the new speed of AI attacks

Newbury-Smith said the successful use of AI is all about being smart about how it’s exploited, and that effectiveness comes down to enablement and engagement. If a prompt yields too much information, refine it until you get what you need. She said this should still be faster than trying to get answers without AI.

The basic message for professionals is that effective applications of AI are all about you staying in the loop, said Bernhard Seiser, vice president of digital, data, and IT at AOP Health.

Think before you use AI, and think again before you push your outputs around the organization.

“It doesn’t help the business if you get AI-generated emails that are many pages long, and then you need ChatGPT to summarize the text,” he told ZDNET.

Seiser said that while there are certain tasks generative AI is good at and worth using for, in the end, “you need to use your brain.”





Source link