Introducing EVMbench
OpenAI and Paradigm introduce EVMbench, a benchmark evaluating AI agents’ ability to detect, patch, and exploit high-severity smart contract vulnerabilities.
Topic feed
AI benchmarks, leaderboards, and comparative model testing.
OpenAI and Paradigm introduce EVMbench, a benchmark evaluating AI agents’ ability to detect, patch, and exploit high-severity smart contract vulnerabilities.
Mathematicians contribute to AI benchmark The University of Manchester
1Password open sources a benchmark to stop AI agents from leaking credentials Help Net Security
Tether EVO Scores Top 5 In Global AI Benchmark for Brain-to-Text AI Challenge Tether.io
University of Manchester academics contribute to the toughest AI benchmark The University of Manchester
Joel Becker: Reconciling Impressive AI Benchmark Performance with Limited Developer Productivity Impacts Stanford Digital Economy Lab
NIST Seeks Public Input on Draft Best Practices for Automated AI Benchmark Testing ExecutiveGov
Google adopts Werewolf and Poker in AI benchmark 'Game Arena' GIGAZINE
New AI benchmark reveals UK agencies are ‘all in’ – but only 2% feel prepared TheBusinessDesk.com
A Blog post by IBM Research on Hugging Face
Spirit AI Open-Sources Spirit v1.5, Tops Global Embodied AI Benchmark Pandaily
OpenAI introduces FrontierScience, a benchmark testing AI reasoning in physics, chemistry, and biology to measure progress toward real scientific research.