GPT-5.2 lands to top Google's Gemini 3 in the AI benchmark game just four weeks after GPT-5.1 - the-decoder.com
GPT-5.2 lands to top Google's Gemini 3 in the AI benchmark game just four weeks after GPT-5.1 the-decoder.com
Topic feed
AI benchmarks, leaderboards, and comparative model testing.
GPT-5.2 lands to top Google's Gemini 3 in the AI benchmark game just four weeks after GPT-5.1 the-decoder.com
The FACTS Benchmark Suite provides a systematic evaluation of Large Language Models (LLMs) factuality across three areas: Parametric, Search, and Multimodal reasoning.
New Benchmark Shows AI Chatbots Are Easily Manipulated Built In
AI Benchmark for Materials Science Research anl.gov
This page automatically loads score data from several LLM leaderboards and shows an interactive chart that tracks how top benchmark results have changed. The chart groups benchmarks by category, hi...
Startup Minitap Tops DeepMind’s Mobile AI Benchmark, Raises $4.1 Million Seed Round Forbes
Abstract page for arXiv paper 2511.06346: LPFQA: A Long-Tail Professional Forum-based Benchmark for LLM Evaluation
A new AI benchmark tests whether chatbots protect human well-being TechCrunch
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Revolutionary Google Gemini 3 Shatters Records with Unprecedented AI Benchmark Scores and Game-Changing Coding App CryptoRank
OpenAI introduces IndQA, a new benchmark for evaluating AI systems in Indian languages. Built with domain experts, IndQA tests cultural understanding and reasoning across 12 languages and 10 knowledge areas.
Polish emerges as top language in multilingual AI benchmark testing PPC Land