Elon Musk’s xAI sets AI benchmark records with new reasoning-optimized Grok 4 model - SiliconANGLE
Elon Musk’s xAI sets AI benchmark records with new reasoning-optimized Grok 4 model SiliconANGLE
Topic feed
AI benchmarks, leaderboards, and comparative model testing.
Elon Musk’s xAI sets AI benchmark records with new reasoning-optimized Grok 4 model SiliconANGLE
Former Intel CEO’s New AI Benchmark Focuses on Human Flourishing The New Stack
Topic: Artificial intelligence (AI) benchmark and training Statista
HealthBench is a new evaluation benchmark for AI in healthcare which evaluates models in realistic scenarios. Built with input from 250+ physicians, it aims to provide a shared standard for model performance and safety in health.
We introduce PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research.
Can frontier LLMs earn $1 million from real-world freelance software engineering?
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
What Makes a Good AI Benchmark? Stanford HAI