NIST Seeks Public Input on Draft Best Practices for Automated AI Benchmark Testing - ExecutiveGov
NIST Seeks Public Input on Draft Best Practices for Automated AI Benchmark Testing ExecutiveGov
Concept
NIST Seeks Public Input on Draft Best Practices for Automated AI Benchmark Testing ExecutiveGov
Google adopts Werewolf and Poker in AI benchmark 'Game Arena' GIGAZINE
New AI benchmark reveals UK agencies are ‘all in’ – but only 2% feel prepared TheBusinessDesk.com
A Blog post by IBM Research on Hugging Face
Spirit AI Open-Sources Spirit v1.5, Tops Global Embodied AI Benchmark Pandaily
OpenAI introduces FrontierScience, a benchmark testing AI reasoning in physics, chemistry, and biology to measure progress toward real scientific research.
GPT-5.2 is OpenAI’s strongest model yet for math and science, setting new state-of-the-art results on benchmarks like GPQA Diamond and FrontierMath. This post shows how those gains translate into real research progress, including solving an open theoretical...
GPT-5.2 lands to top Google's Gemini 3 in the AI benchmark game just four weeks after GPT-5.1 the-decoder.com
The FACTS Benchmark Suite provides a systematic evaluation of Large Language Models (LLMs) factuality across three areas: Parametric, Search, and Multimodal reasoning.
New Benchmark Shows AI Chatbots Are Easily Manipulated Built In
AI Benchmark for Materials Science Research anl.gov
Startup Minitap Tops DeepMind’s Mobile AI Benchmark, Raises $4.1 Million Seed Round Forbes