METR Blog page 2

METR Blog February 24, 2026 08:00

We are Changing our Developer Productivity Experiment Design

Our second developer productivity study faces selection effects from wider AI adoption, prompting us to redesign our approach.

METR Blog February 19, 2026 08:00

Five lessons from having helped run an AI-Biology RCT

Luca Righetti shares takeaways on the role of randomized controlled trials in AI safety testing.

METR Blog February 18, 2026 00:00

How We Protect Confidential Information

Our high-level approach to protecting confidential access and information

METR Blog February 17, 2026 08:00

Analyzing coding agent transcripts to upper bound productivity gains from AI agents

Amy Deng investigates whether coding agent transcripts could serve as an alternative for estimating AI productivity uplift, using 5305 Claude Code transcripts from METR technical staff.

METR Blog February 13, 2026 08:00

Measuring Time Horizon using Claude Code and Codex

Nikola Jurkovic describes our measurements of time horizon using Claude Code and Codex scaffolds.

METR Blog February 10, 2026 08:00

A simpler AI timelines model predicts 99% AI R&D automation in ~2032

Thomas Kwa describes a simple model for forecasting when AI will automate AI development, based on the AI Futures model but with only 8 parameters.

METR Blog January 29, 2026 22:12

Frontier AI safety regulations: A reference for lab staff

Miles Kodama and Michael Chen summarize key provisions from California's SB 53, the EU Code of Practice, and New York's RAISE Act covering frontier AI developers.

METR Blog January 29, 2026 08:00

Time Horizon 1.1

We’re releasing a new version of our time horizon estimates (TH1.1), using more tasks and a new eval infrastructure.

METR Blog January 22, 2026 08:00

Early work on monitorability evaluations

We show preliminary results on a prototype evaluation that tests monitors' ability to catch AI agents doing side tasks, and AI agents' ability to bypass this monitoring.

METR Blog January 22, 2026 08:00

Clarifying limitations of time horizon

Thomas Kwa responds to some misinterpretations of our time horizon work, and explains limitations and the core finding.

METR Blog December 09, 2025 08:00

Common Elements of Frontier AI Safety Policies (December 2025 Update)

Shared components of AI lab commitments to evaluate and mitigate severe risks.

METR Blog November 19, 2025 08:00

Details about METR's evaluation of OpenAI GPT-5.1-Codex-Max

We evaluate whether GPT-5.1-Codex-Max poses significant catastrophic risks via AI self-improvement, rogue replication, or sabotage of AI labs. We conclude that this seems unlikely.