Claude Opus 4.7 Boosts SWE-bench to 87.6% - blockchain.news
Claude Opus 4.7 Boosts SWE-bench to 87.6% blockchain.news
Product
Claude Opus 4.7 Boosts SWE-bench to 87.6% blockchain.news
Claude Opus 4.7, Gemini 3.1 Pro, and Others Score 0% on New SWE Benchmark Analytics India Magazine
External review from METR of Anthropic's Sabotage Risk Report for Claude Opus 4.6
How Anthropic’s Claude Opus 4.6 Broke Its Own AI Benchmark WinBuzzer