FACTS Benchmark Suite: a new way to systematically evaluate LLMs factuality

evald.ai Google DeepMind Evaluation Filter

The FACTS Benchmark Suite provides a systematic evaluation of Large Language Models (LLMs) factuality across three areas: Parametric, Search, and Multimodal reasoning.

Google DeepMind · The FACTS team

Benchmarks

Open original