Anthropic page 2

METR Blog October 28, 2025 09:47

Review of the Anthropic Summer 2025 Pilot Sabotage Risk Report

External review from METR of Anthropic's Summer 2025 Sabotage Risk Report

Safety Evals

Anthropic Safety Evals

OpenAI Evaluation Filter August 27, 2025 10:00

Findings from a pilot Anthropic–OpenAI alignment evaluation exercise: OpenAI Safety Tests

OpenAI and Anthropic share findings from a first-of-its-kind joint safety evaluation, testing each other’s models for misalignment, instruction following, hallucinations, jailbreaking, and more—highlighting progress, challenges, and the value of cross-lab...

Testing Tools

Anthropic Testing Tools OpenAI

METR Blog August 08, 2025 07:00

CoT May Be Highly Informative Despite “Unfaithfulness”

Recent work from Anthropic and others claims that LLMs' chains of thoughts can be “unfaithful”. These papers make an important point: you can't take everything in the CoT at face value. As a result, people often use these results to conclude the CoT is...

Anthropic

METR Blog November 22, 2024 08:00

Evaluating frontier AI R&D capabilities of language model agents against human experts

We’re releasing RE-Bench, a new benchmark for measuring the performance of humans and frontier model agents on ML research engineering tasks. We also share data from 71 human expert attempts and results for Anthropic’s Claude 3.5 Sonnet and OpenAI’s...

Benchmarks

Anthropic Claude Benchmarks OpenAI