evald.ai Topics

METR Blog

Time Horizon 1.1

We’re releasing a new version of our time horizon estimates (TH1.1), using more tasks and a new eval infrastructure.

LLM Evaluation