Anthropic launches Opus 4.8, with honesty as its killer feature


shutterstock-editorial-2758959927

Primakov/Shutterstock

Follow ZDNET: Add us as a preferred source on Google.


ZDNET’s key takeaways

  • Claude Opus 4.8 promises more honest AI answers.
  • Dynamic workflows can run hundreds of Claude subagents.
  • Fast mode gets cheaper, while regular Opus pricing stays put.

Diogenes was a fourth-century B.C. Greek philosopher known for his performance art. He is said to have roamed the streets of Athens in the middle of the day, carrying a lit lantern, crying out, “I am looking for an honest man.” If that myth were modernized for the present day, we’d all be looking for an honest AI.

Anthropic is both announcing and releasing Claude Opus 4.8, a large language model it believes might have satisfied Diogenes’ quest.

“One of the most prominent improvements in Opus 4.8 is its honesty,” the company said Thursday in a blog post.

Also: Your Claude agents can ‘dream’ now – how Anthropic’s new feature works

Now, perhaps, this new frontier model will behave itself better. Anthropic reports that Opus 4.8 is less likely to make unsupported claims. It’s also more likely to tell you when it’s uncertain of an answer. 

“This is borne out in our evaluations, which show that Opus 4.8 is around 4x less likely than its predecessor to allow flaws in code it’s written to pass unremarked,” the company said.

In Claude Code, I found Opus 4.7 to be a substantial improvement over 4.6. While 4.6 would often misinterpret instructions or deliver erroneous results, Opus 4.7 regularly tells me that the way it first looked at a problem didn’t work, and it’s taking a different tactic. Recent project assignments have shown a much greater degree of understanding than with 4.6.

So, given the jump in quality from 4.6 to 4.7, which was subjectively quite noticeable over many sessions, I’m hoping we’ll see the same in the jump from 4.7 to 4.8.

Also: The 5 myths of the agentic coding apocalypse

It would seem this is the case, at least according to Tom Pritchard, staff engineer at Spotify, who has already tested Opus 4.8. 

“Claude Opus 4.8 has noticeably better judgment. In Claude Code, it asks the right questions, catches its own mistakes, pushes back when a plan isn’t sound, and builds up confidence around complex, multi-service explorations before making big changes. It’s a great model to build with,” he said in the blog post.

That’ll be nice.

A matter of effort

Claude Code has had the ability to set effort since at least 4.7 (at least, that’s when I first noticed it). Effort is essentially a measure of how much AI oomph the model throws at a problem, measured in tokens.

In Opus 4.8, Claude Code’s default of high effort produces what the company said is “the best overall balance of quality and user experience.” In coding tasks, this default spends a similar number of tokens as the default level offered in Claude Code Opus 4.7, but with better performance.

Also: Anthropic’s Mythos is evolving faster than expected, reports AI safety agency

This effort capability is now moving into Claude.ai and Cowork. With higher effort settings, Claude will “think more frequently and more deeply.” With a lower effort set, Claude responds faster, and users will find their AI experiences are throttled less.

Dynamic workflows

At launch time, this feature hasn’t been fully defined, but it’s interesting. Launching as a research preview, Opus 4.8 can plan work, run hundreds of parallel subagents in one session, and verify outputs before reporting back. This feature is designed for very large-scale tasks. The example Anthropic gave was codebase-scale migrations across hundreds of thousands of lines.

It seems like Claude can generate and manage the workflow as the task evolves. Rather than running off a fixed plan, agents can change their priorities and tasks based on what they find while doing their work. This could be powerful.

Also: Anthropic’s new Claude Security tool scans your codebase for flaws – and helps you decide what to fix first

Anthropic said that the subagents verify their results before reporting back to users. If Claude is coordinating hundreds of subagents, users need it to notice uncertainty, bad assumptions, and failed outputs.

Interestingly, this connects right back to the honesty claims discussed at the beginning of the article. If Claude is going to launch “thousands of agents,” getting back reliable and vetted results really matters, because there’s no way human oversight can keep up on its own.

The dynamic workflows capability will be available to Claude Code users on Enterprise, Team, and Max plans.

Price and availability

Anthropic said Claude Opus 4.8 is available everywhere Thursday through Claude and the Claude API as claude-opus-4-8.

In practice, especially if you’re using Claude Code, you might find that you’ll need to restart your session or wait a day or so for Claude Code to notice it. When Anthropic jumped Opus 4.6 to 4.7, I kept asking Claude Code what model it was using, and it wasn’t until the next morning that it stopped reporting Opus 4.6 and started reporting Opus 4.7.

Overall pricing hasn’t changed since Opus 4.7. Regular token-based pricing remains $5 per million input tokens and $25 per million output tokens.

Also: This exec offers 4 ways to be a successful innovator in the age of agentic AI

The company said that fast mode, which enables the model to work at 2.5 times the speed of normal mode, will be “three times cheaper than it was for previous models.” While I don’t spend on fast mode, I do see the appeal. I’ve watched a lot of YouTube, waiting for Claude Code to respond to a prompt, hour after hour.

Would you rather have Claude respond faster with lower effort or think longer with higher effort? Let us know in the comments below.


You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.





Source link

Leave a Reply

Subscribe to Our Newsletter

Get our latest articles delivered straight to your inbox. No spam, we promise.

Recent Reviews


Samsung is facing a fresh legal challenge that could put a big red “Stop” sign for its foldable phones in the US. Lepton Computing LLC has just filed a lawsuit in a Texas federal court, accusing the South Korean tech giant and its US arm of infringing multiple patents related to foldable phone technology.

If the legal action escalates, it could impact sales of Samsung’s Galaxy Z lineup, which includes the Fold, Flip, and new TriFold models.

What the lawsuit claims

In the legal filing, which was later covered by The Biz, Lepton alleges that Samsung is using patented technologies for flexible display structure, hinge mechanism, and user interface behaviors without authorization. The company claims that it developed these ideas years prior to these foldable phones hitting the market.

The patents in question include concepts around how foldable displays operate and how software adapts to the changing screen states. Both of these are practically central to modern foldable devices. Now, Lepton is seeking damages. But what’s more notable is that it’s pushing for a potential ban on Samsung’s foldable phones in the US market.

What’s the verdict?

Keep in mind that claiming patent infringement is not the same as actually proving it. Patent disputes in the tech industry are often complex due to overlapping ideas, prior art, and competing claims. While Lepton does hold patents related to foldable technology, this doesn’t immediately prove that Samsung has violated them.

Samsung already has an extensive portfolio of patents around foldable tech that it has built over years of research and development, which will likely play a central role if the case does end up moving forward.

Why does this matter, and what happens next?

Samsung is one of the largest brands in the foldable phone market, especially in the US, where the only real competition is Motorola’s Razr series. So any disruption could have notable effects across the entire segment. In the extreme scenario that Samsung does get barred from selling foldables in the US, Apple’s upcoming foldable iPhone could enter the market with virtually no competition.

At the moment, this is still in the early stages of a legal battle. Cases like this can often take years to resolve, with the outcomes usually involving a hefty settlement. Till then, it remains a developing story.



Source link