Opus 4.7 - second best, by design

Francisco Ríos

April 20, 2026 - 3 min read

Anthropic launched Claude Opus 4.7, its new generally available flagship model, with improvements across software engineering, multidisciplinary reasoning, scaled tool use, and agentic computer use. Its pricing remains unchanged from Opus 4.6: $5 per million input tokens and $25 per million output tokens. Taken on its own, it’s a straightforward release. In context, though, it points directly to the elephant in the room: Mythos.

Just two weeks ago, Anthropic announced that its most capable model, Claude Mythos, won't be made publicly available anytime soon. Reportedly, its cybersecurity capabilities are just too dangerous for broad distribution. The Mythos model card described autonomous zero-day vulnerability discovery and exploitation across both Open Source and proprietary software, with minimal human steering. Access was restricted to approximately 50 major technology and critical infrastructure vendors under Project Glasswing, including AWS, Apple, Microsoft, Google, and the Linux Foundation. That puts Opus 4.7 in a peculiar position: it's officially "Anthropic's most capable model you can use", but only because the one after it was deemed too dangerous to release.

Anthropic is explicit about this: Opus 4.7 serves as a testbed for the automated safeguards the company hopes to eventually apply to Mythos-class models. That includes deliberately reducing the model's cybersecurity capabilities during training. The result is a model that arrives on the market with a confessed, built-in limitation, which is actually very unusual in an industry where press releases don't typically stop to explain what was left out. That Anthropic says this openly is, in itself, an interesting signal about how they're thinking through the relationship between capability and distribution.

On our aggregated Leaderboard, Opus 4.7 scores 0.568 against Opus 4.6's 0.562, only six thousandths of a point apart. On LMArena's text category, the lead is 1505 vs. 1503, a margin that could statistically be noise. LiveBench also shows a small gap: 76.9 vs. 76.3. On Artificial Analysis, there is a bigger 4 full-point difference, but this only locates Opus 4.7 on the same level of the older models GPT-5.4 and Gemini 3.1 Pro. This somewhat lacking improvment also reflects overall perception in social media, where some users are disclosing that the difference between 4.6 and 4.7 is almost unperceptible.

Thing is, this doesn't have to necessarily be a problem. Anthropic appears to be running a dual-track strategy: frequent Opus iterations for the broad commercial market, while Mythos stays controlled for safety reasons. It is a coherent stance, if politically uncomfortable, because it means admitting that the "general availability" product and the "actual frontier" model are no longer the same thing. For now, Opus 4.7 is the best Anthropic's model most users and developers can access. But knowing something more capable exists and isn't coming anytime soon changes the feeling of "state of the art" just a little, right?

Scan the QR code to view this story on your mobile device.

AnthropicModel Leaderboard

Opus 4.7 - second best, by design

Related Stories