METR Blog

Bounty: Diverse hard tasks for LLM agents

METR (formerly ARC Evals) is looking for (1) ideas, (2) detailed specifications, and (3) well-tested implementations for tasks to measure performance of autonomous LLM agents.

METR Blog

ARC Evals is now METR

ARC Evals is wrapping up our incubation period at ARC, and spinning off into our own standalone nonprofit.

OpenAI Evaluation Filter

Frontier risk and preparedness

To support the safety of highly-capable AI systems, we are developing our approach to catastrophic risk preparedness, including building a Preparedness team and launching a challenge.

Safety Evals

Safety Evals

METR Blog

Responsible Scaling Policies (RSPs)

We describe the basic components of Responsible Scaling Policies (RSPs) as well as why we find them promising for reducing catastrophic risks from AI.

METR Blog

ARC Evals is spinning out from ARC

ARC Evals plans to spin out from the Alignment Research Center (ARC) in the coming months, and become its own standalone organization.

More stories

More stories load automatically as you scroll.