METR Blog

Details about METR's preliminary evaluation of OpenAI o1-preview

We measured the performance of OpenAI's o1-mini and o1-preview models on our autonomy and AI R&D task suites, and found they did not exceed the capabilities of the best existing public model we've evaluated, though we could not confidently upper-bound...

OpenAI

METR Blog

ERROR: The request could not be satisfied

Suggestions for expanded guidance on capability elicitation and robust model safeguards in the U.S. AI Safety Institute’s draft document “Managing Misuse Risk for Dual-Use Foundation Models” (NIST AI 800-1).

Safety Evals

Safety Evals

METR Blog

Vivaria

Vivaria is METR's tool for running evaluations and conducting agent elicitation research. Vivaria is a web application with which users can interact using a web UI and a command-line interface.

More stories

More stories load automatically as you scroll.