MLCommons Evaluation Filter

MLCommons Evaluation Filter May 07, 2026 13:23

GPT-OSS 20B: A Sparse MoE Pretraining Benchmark for MLPerf Training v6.0 - MLCommons

MLPerf Training v6.0 introduces GPT-OSS 20B, a new sparse Mixture-of-Experts (MoE) pretraining benchmark designed for accessibility on single 8-GPU nodes.

MLCommons Evaluation Filter May 05, 2026 13:37

DeepSeek-V3: A Large-Scale MoE Pretraining Benchmark for MLPerf Training v6.0 - MLCommons

MLPerf Training v6.0 introduces a large-scale pretraining benchmark built on DeepSeek-V3, bringing Mixture-of-Experts (MoE) evaluation to the suite.

MLCommons Evaluation Filter April 20, 2026 22:10

Fresh Benchmarks, Reliable Scores: Introducing Continuous Prompt Stewardship for AI Risk Evaluation - MLCommons

MLCommons introduces Continuous Prompt Stewardship to keep the AILuminate AI safety benchmark fresh and reliable as frontier models evolve.

MLCommons Evaluation Filter April 06, 2026 14:57

MLCommons Releases MLPerf Client v1.6 with Performance Optimizations and Enhanced User Experience - MLCommons

MLCommons releases MLPerf Client v1.6 with updated Windows ML and llama.cpp support, Apple MLX improvements for Mac and iPad, and usability enhancements for faster, more reliable AI benchmarking on personal computers.

MLCommons Evaluation Filter April 01, 2026 14:50

MLCommons Releases New MLPerf Inference v6.0 Benchmark Results - MLCommons

MLCommons releases MLPerf Inference v6.0 results — the most significant benchmark update to date, with new tests for text-to-video, GPT-OSS 120B, DLRMv3, vision-language models, and YOLOv11

MLCommons Evaluation Filter March 24, 2026 14:47

A new GPT-OSS benchmark and DeepSeek R1 updates for latency-optimized reasoning - MLCommons

MLPerf Inference v6.0 introduces GPT-OSS 120B, a new open-weight LLM benchmark, plus a DeepSeek-R1 interactive scenario with support for speculative decoding.

MLCommons Evaluation Filter March 19, 2026 18:59

Standardizing Generative AI Service Evaluation: An API-Centric Benchmarking Approach - MLCommons

MLPerf® Endpoints brings API-native benchmarking, Pareto curve visualizations, and rolling submissions to generative AI infrastructure evaluation.

MLCommons Evaluation Filter March 13, 2026 16:57

Global Standards, Local Ground Truths: Piloting Multilingual, Multimodal AI Safety Understanding in APAC - MLCommons

MLCommons is developing the AILuminate Culturally-Specific Multimodal Benchmark to close the AI performance and representation gap across APAC cultures, languages, and real-world use cases.

MLCommons Evaluation Filter March 12, 2026 15:21

YOLO for the MLPerf Inference v6.0 Edge Suite - MLCommons

MLPerf Inference v6.0 upgrades its edge object detection benchmark from RetinaNet to YOLOv11, bringing modern real-time detection to standardized AI hardware evaluation

MLCommons Evaluation Filter March 10, 2026 14:17

Bringing Text-to-Video to MLPerf Inference v6.0 - MLCommons

MLCommons introduces the new Text-to-Video benchmark in MLPerf Inference v6.0, based on the Wan2.2-T2V-A14B-Diffusers model and validated using the VBench framework. Learn about the key architectural decisions, including the adoption of the SingleStream...