Claude Opus 4.7 leads on SWE-bench and agentic reasoning, beating GPT-5.4 and Gemini 3.1 Pro



In short: Anthropic has released Claude Opus 4.7, its most capable generally available model, with benchmark-leading scores on SWE-bench Pro (64.3% vs GPT-5.4’s 57.7%), multi-agent coordination for hours-long workflows, 3x higher image resolution, and a 14% improvement in multi-step agentic reasoning with a third of the tool errors. Priced at $5/$25 per million tokens, it is available across Claude plans and through Amazon Bedrock, Vertex AI, and Microsoft Foundry.

Anthropic has released Claude Opus 4.7, its most capable generally available model to date, with benchmark-leading performance in software engineering and agentic reasoning that widens the gap between Claude and both OpenAI’s GPT-5.4 and Google’s Gemini 3.1 Pro on the tasks that matter most to developers and enterprise users.

The release comes at a moment when Anthropic’s commercial momentum is difficult to overstate. The company is running at a $30 billion annualised revenue rate, has attracted investor offers at roughly $800 billion, and is in early IPO talks. Opus 4.7 is the model that has to justify those numbers, not by winning every benchmark, but by being the model that enterprises and developers choose to build on.

Where it leads

The headline numbers are in software engineering. On SWE-bench Pro, the benchmark that tests a model’s ability to resolve real-world software issues from open-source repositories, Opus 4.7 scores 64.3%, up from 53.4% on Opus 4.6 and well ahead of GPT-5.4 at 57.7% and Gemini 3.1 Pro at 54.2%. On SWE-bench Verified, a curated subset, the score is 87.6%, compared with 80.8% for its predecessor and 80.6% for Gemini 3.1 Pro.

CursorBench, which measures autonomous coding performance in the popular AI code editor, shows a similar jump: 70%, up from 58% on Opus 4.6. For a model that is already the default choice in Cursor and Claude Code, the improvement on the benchmark most directly tied to how developers actually use it is significant. Claude Code alone hit $2.5 billion in annualised revenue in February, and AI-assisted coding has become one of the fastest-growing categories in software.

On graduate-level reasoning, measured by GPQA Diamond, the field has converged. Opus 4.7 scores 94.2%, GPT-5.4 Pro scores 94.4%, and Gemini 3.1 Pro scores 94.3%. The differences are within noise. The frontier models have effectively saturated this benchmark, which means the competitive differentiation is shifting away from raw reasoning scores and toward applied performance on complex, multi-step tasks.

The agentic step

Opus 4.7’s most consequential improvements may not be captured by any single benchmark. Anthropic says the model delivers a 14% improvement over Opus 4.6 on complex multi-step workflows while using fewer tokens and producing a third of the tool errors. It is the first Claude model to pass what Anthropic calls “implicit-need tests,” tasks where the model must infer what tools or actions are required rather than being told explicitly.

The model also introduces multi-agent coordination, the ability to orchestrate parallel AI workstreams rather than processing tasks sequentially. For enterprise users running Claude across code review, document analysis, and data processing simultaneously, this is the kind of capability that translates directly into throughput. Anthropic says Opus 4.7 is engineered to sustain focus over hours-long workflows, a claim that, if it holds, addresses one of the most common complaints about frontier models: that they lose coherence and precision on extended agentic tasks.

Resilience is another emphasis. The model is designed to continue executing through tool failures that would have stopped Opus 4.6, recovering and adapting rather than halting. For automated pipelines where a single failure can cascade, this kind of robustness matters more than marginal benchmark gains.

Vision and context

Opus 4.7 processes images at resolutions up to 2,576 pixels on the long edge, more than three times the capacity of prior Claude models. The improvement is aimed at enterprise document analysis, where scanned contracts, technical drawings, and financial statements often contain fine print and detail that lower-resolution vision models miss or hallucinate.

The context window remains at one million tokens, half of Gemini 3.1 Pro’s two million but sufficient for most enterprise use cases. On long-context research benchmarks, Opus 4.7 tied for the top overall score at 0.715 across six research modules and delivered what evaluators described as the most consistent long-context performance of any model tested.

Anthropic notes that the model follows instructions more literally than its predecessors, a change that may require users to adjust existing prompts. This is a trade-off: tighter instruction-following reduces the ambiguity that sometimes produces creative or unexpected outputs, but it also reduces the hallucination and off-task behaviour that frustrates enterprise deployments.

Pricing and availability

Opus 4.7 is available immediately on Claude Pro, Max, Team, and Enterprise plans, and through the API at $5 per million input tokens and $25 per million output tokens. Prompt caching offers up to 90% cost savings, and the Batch API provides a 50% discount on both input and output. The model is also available through Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry.

The pricing is unchanged from Opus 4.6, which means Anthropic is delivering substantially better performance at the same cost. Gemini 3.1 Pro undercuts it at $2 and $12 per million tokens for input and output respectively, but Opus 4.7’s lead on the benchmarks that enterprise buyers care about, particularly SWE-bench and agentic reasoning, may justify the premium for customers whose workloads demand the highest capability.

Anthropic has also added cyber safeguards that automatically detect and block requests indicating prohibited or high-risk cybersecurity uses, a nod to the dual-use concerns that led the company to restrict its more powerful Mythos model to just 11 organisations under Project Glasswing.

What it means

Opus 4.7 is not a paradigm shift. It is a meaningful improvement across every dimension that matters to the people who pay for Claude: better coding, better agentic reasoning, better vision, better instruction-following, and better resilience on long tasks. The model does not win every benchmark against every competitor, but it wins convincingly on the ones most directly tied to real-world productivity.

For Anthropic, the release reinforces the position that has driven its extraordinary revenue growth. Claude is the model that developers and enterprises reach for when they need reliable, high-quality output on complex work. Opus 4.7 extends that lead at a moment when the company’s commercial trajectory depends on it. The competition is close, and closing. But for now, on the tasks that generate the most revenue, Anthropic has the best model on the market.



Source link

Leave a Reply

Subscribe to Our Newsletter

Get our latest articles delivered straight to your inbox. No spam, we promise.

Recent Reviews


As I’m writing this, NVIDIA is the largest company in the world, with a market cap exceeding $4 trillion. Team Green is now the leader among the Magnificent Seven of the tech world, having surpassed them all in just a few short years.

The company has managed to reach these incredible heights with smart planning and by making the right moves for decades, the latest being the decision to sell shovels during the AI gold rush. Considering the current hardware landscape, there’s simply no reason for NVIDIA to rush a new gaming GPU generation for at least a few years. Here’s why.

Scarcity has become the new normal

Not even Nvidia is powerful enough to overcome market constraints

Global memory shortages have been a reality since late 2025, and they aren’t just affecting RAM and storage manufacturers. Rather, this impacts every company making any product that contains memory or storage—including graphics cards.

Since NVIDIA sells GPU and memory bundles to its partners, which they then solder onto PCBs and add cooling to create full-blown graphics cards, this means that NVIDIA doesn’t just have to battle other tech giants to secure a chunk of TSMC’s limited production capacity to produce its GPU chips. It also has to procure massive amounts of GPU memory, which has never been harder or more expensive to obtain.

While a company as large as NVIDIA certainly has long-term contracts that guarantee stable memory prices, those contracts aren’t going to last forever. The company has likely had to sign new ones, considering the GPU price surge that began at the beginning of 2026, with gaming graphics cards still being overpriced.

With GPU memory costing more than ever, NVIDIA has little reason to rush a new gaming GPU generation, because its gaming earnings are just a drop in the bucket compared to its total earnings.

NVIDIA is an AI company now

Gaming GPUs are taking a back seat

A graph showing NVIDIA revenue breakdown in the last few years. Credit: appeconomyinsights.com

NVIDIA’s gaming division had been its golden goose for decades, but come 2022, the company’s data center and AI division’s revenue started to balloon dramatically. By the beginning of fiscal year 2023, data center and AI revenue had surpassed that of the gaming division.

In fiscal year 2026 (which began on July 1, 2025, and ends on June 30, 2026), NVIDIA’s gaming revenue has contributed less than 8% of the company’s total earnings so far. On the other hand, the data center division has made almost 90% of NVIDIA’s total revenue in fiscal year 2026. What I’m trying to say is that NVIDIA is no longer a gaming company—it’s all about AI now.

Considering that we’re in the middle of the biggest memory shortage in history, and that its AI GPUs rake in almost ten times the revenue of gaming GPUs, there’s little reason for NVIDIA to funnel exorbitantly priced memory toward gaming GPUs. It’s much more profitable to put every memory chip they can get their hands on into AI GPU racks and continue receiving mountains of cash by selling them to AI behemoths.

The RTX 50 Super GPUs might never get released

A sign of times to come

NVIDIA’s RTX 50 Super series was supposed to increase memory capacity of its most popular gaming GPUs. The 16GB RTX 5080 was to be superseded by a 24GB RTX 5080 Super; the same fate would await the 16GB RTX 5070 Ti, while the 18GB RTX 5070 Super was to replace its 12GB non-Super sibling. But according to recent reports, NVIDIA has put it on ice.

The RTX 50 Super launch had been slated for this year’s CES in January, but after missing the show, it now looks like NVIDIA has delayed the lineup indefinitely. According to a recent report, NVIDIA doesn’t plan to launch a single new gaming GPU in 2026. Worse still, the RTX 60 series, which had been expected to debut sometime in 2027, has also been delayed.

A report by The Information (via Tom’s Hardware) states that NVIDIA had finalized the design and specs of its RTX 50 Super refresh, but the RAM-pocalypse threw a wrench into the works, forcing the company to “deprioritize RTX 50 Super production.” In other words, it’s exactly what I said a few paragraphs ago: selling enterprise GPU racks to AI companies is far more lucrative than selling comparatively cheaper GPUs to gamers, especially now that memory prices have been skyrocketing.

Before putting the RTX 50 series on ice, NVIDIA had already slashed its gaming GPU supply by about a fifth and started prioritizing models with less VRAM, like the 8GB versions of the RTX 5060 and RTX 5060 Ti, so this news isn’t that surprising.

So when can we expect RTX 60 GPUs?

Late 2028-ish?

A GPU with a pile of money around it. Credit: Lucas Gouveia / How-To Geek

The good news is that the RTX 60 series is definitely in the pipeline, and we will see it sooner or later. The bad news is that its release date is up in the air, and it’s best not to even think about pricing. The word on the street around CES 2026 was that NVIDIA would release the RTX 60 series in mid-2027, give or take a few months. But as of this writing, it’s increasingly likely we won’t see RTX 60 GPUs until 2028.

If you’ve been following the discussion around memory shortages, this won’t be surprising. In late 2025, the prognosis was that we wouldn’t see the end of the RAM-pocalypse until 2027, maybe 2028. But a recent statement by SK Hynix chairman (the company is one of the world’s three largest memory manufacturers) warns that the global memory shortage may last well into 2030.

If that turns out to be true, and if the global AI data center boom doesn’t slow down in the next few years, I wouldn’t be surprised if NVIDIA delays the RTX 60 GPUs as long as possible. There’s a good chance we won’t see them until the second half of 2028, and I wouldn’t be surprised if they miss that window as well if memory supply doesn’t recover by then. Data center GPUs are simply too profitable for NVIDIA to reserve a meaningful portion of memory for gaming graphics cards as long as shortages persist.


At least current-gen gaming GPUs are still a great option for any PC gamer

If there is a silver lining here, it is that current-gen gaming GPUs (NVIDIA RTX 50 and AMD Radeon RX 90) are still more than powerful enough for any current AAA title. Considering that Sony is reportedly delaying the PlayStation 6 and that global PC shipments are projected to see a sharp, double-digit decline in 2026, game developers have little incentive to push requirements beyond what current hardware can handle.

DLSS 5, on the other hand, may be the future of gaming, but no one likes it, and it will take a few years (and likely the arrival of the RTX 60 lineup) for it to mature and become usable on anything that’s not a heckin’ RTX 5090.

If you’re open to buying used GPUs, even last-gen gaming graphics cards offer tons of performance and are able to rein in any AAA game you throw at them. While we likely won’t get a new gaming GPU from NVIDIA for at least a few years, at least the ones we’ve got are great today and will continue to chew through any game for the foreseeable future.



Source link