METR Blog

Autonomy Evaluation Resources

A collection of resources for evaluating potentially dangerous autonomous capabilities of frontier models.

METR Blog

Example autonomy evaluation protocol

An example protocol for the whole evaluation process, based on our task suite, elicitation protocol, and scoring methods.

METR Blog

GitHub - METR/public-tasks

Contribute to METR/public-tasks development by creating an account on GitHub.

More stories

More stories load automatically as you scroll.