Benchmarks page 4

Google News LLM Evaluation March 25, 2026 07:00

Exclusive: This new benchmark could expose AI’s biggest weakness - Fast Company

Exclusive: This new benchmark could expose AI’s biggest weakness Fast Company

MLCommons Evaluation Filter March 24, 2026 14:47

A new GPT-OSS benchmark and DeepSeek R1 updates for latency-optimized reasoning - MLCommons

MLPerf Inference v6.0 introduces GPT-OSS 120B, a new open-weight LLM benchmark, plus a DeepSeek-R1 interactive scenario with support for speculative decoding.

Benchmarks

Mitchell Bryson AI Reliability Articles March 24, 2026 00:00

The land grab has gone financial - Mitchell Bryson

OpenAI's 17.5% guaranteed-return PE pitch, its 450,000 sq ft campus lease, and the Helion fusion deal all point to the same shift: the AI race is no longer about who has the best model — it's about who can lock in distribution, real estate, and energy...

Benchmarks

Anthropic Benchmarks OpenAI

Google News LLM Evaluation March 20, 2026 07:00

Insilico Medicine Highlights AI Benchmark Results in Cardiovascular Drug Target Discovery - TipRanks

Insilico Medicine Highlights AI Benchmark Results in Cardiovascular Drug Target Discovery TipRanks

Benchmarks

Benchmarks Target

MLCommons Evaluation Filter March 19, 2026 18:59

Standardizing Generative AI Service Evaluation: An API-Centric Benchmarking Approach - MLCommons

MLPerf® Endpoints brings API-native benchmarking, Pareto curve visualizations, and rolling submissions to generative AI infrastructure evaluation.

Benchmarks

Google News LLM Evaluation March 18, 2026 07:00

Arena Leaderboard: The Unbreakable Ranking System That’s Revolutionizing AI Model Evaluation - CryptoRank

Arena Leaderboard: The Unbreakable Ranking System That’s Revolutionizing AI Model Evaluation CryptoRank

Benchmarks LLM Evaluation

Google DeepMind Evaluation Filter March 17, 2026 16:03

Measuring progress toward AGI: A cognitive framework

Google DeepMind proposes a cognitive framework to evaluate AGI and launches a Kaggle hackathon to build capability benchmarks

Benchmarks

Benchmarks Google Google DeepMind

Google News LLM Evaluation March 15, 2026 07:00

AI benchmark numbers are meaningless — here’s what to look for instead - MakeUseOf

AI benchmark numbers are meaningless — here’s what to look for instead MakeUseOf

Benchmarks

Google News LLM Evaluation March 14, 2026 07:00

MetaEval: Measuring the Discrimination of Benchmarks for Efficient LLM Evaluation - The Association for the Advancement of Artificial Intelligence

MetaEval: Measuring the Discrimination of Benchmarks for Efficient LLM Evaluation The Association for the Advancement of Artificial Intelligence

Benchmarks LLM Evaluation

MLCommons Evaluation Filter March 13, 2026 16:57

Global Standards, Local Ground Truths: Piloting Multilingual, Multimodal AI Safety Understanding in APAC - MLCommons

MLCommons is developing the AILuminate Culturally-Specific Multimodal Benchmark to close the AI performance and representation gap across APAC cultures, languages, and real-world use cases.

Benchmarks Safety Evals

Safety Evals Benchmarks

MLCommons Evaluation Filter March 12, 2026 15:21

YOLO for the MLPerf Inference v6.0 Edge Suite - MLCommons

MLPerf Inference v6.0 upgrades its edge object detection benchmark from RetinaNet to YOLOv11, bringing modern real-time detection to standardized AI hardware evaluation

Benchmarks

Google News LLM Evaluation March 12, 2026 07:00

Qwen3.5-9B tops every AI benchmark right now, but that's not how you should pick a model - XDA

Qwen3.5-9B tops every AI benchmark right now, but that's not how you should pick a model XDA

Benchmarks