METR Blog

CoT May Be Highly Informative Despite “Unfaithfulness”

Recent work from Anthropic and others claims that LLMs' chains of thoughts can be “unfaithful”. These papers make an important point: you can't take everything in the CoT at face value. As a result, people often use these results to conclude the CoT is...

METR Blog

Details about METR's evaluation of OpenAI GPT-5

We evaluate whether GPT-5 poses significant catastrophic risks via AI self-improvement, rogue replication, or sabotage of AI labs. We conclude that this seems unlikely. However, capability trends continue rapidly, and models display increasing eval awareness.

LLM Evaluation

OpenAI Evaluation Filter

ChatGPT agent System Card

ChatGPT agent System Card: OpenAI’s agentic model unites research, browser automation, and code tools with safeguards under the Preparedness Framework.

Safety Evals

More stories

More stories load automatically as you scroll.