Safety Evals page 2

METR Blog October 23, 2025 07:00

Summary of our gpt-oss methodology review

Details on external recommendations from METR for gpt-oss Preparedness experiments and follow-up from OpenAI.

OpenAI Evaluation Filter July 17, 2025 10:00

ChatGPT agent System Card

ChatGPT agent System Card: OpenAI’s agentic model unites research, browser automation, and code tools with safeguards under the Preparedness Framework.

Safety Evals

METR Blog June 27, 2025 07:00

What should companies share about risks from frontier AI models?

Current views on information relevant for visibility into frontier AI risk.

Safety Evals

OpenAI Evaluation Filter April 15, 2025 00:00

Our updated Preparedness Framework

Sharing our updated framework for measuring and protecting against severe harm from frontier AI capabilities.

Safety Evals

Google DeepMind Evaluation Filter April 02, 2025 13:31

Taking a responsible path to AGI

We’re exploring the frontiers of AGI, prioritizing technical safety, proactive risk assessment, and collaboration with the AI community.

Safety Evals

OpenAI Evaluation Filter February 25, 2025 10:00

This report outlines the safety work carried out prior to releasing deep research including external red teaming, frontier risk evaluations according to our Preparedness Framework, and an overview of the mitigations we built in to address key risk areas.

Safety Evals

METR Blog February 08, 2025 16:00

Frontier AI Safety Policies

Model Evaluation & Threat Research

Safety Evals LLM Evaluation

OpenAI Evaluation Filter January 31, 2025 11:00

OpenAI o3-mini System Card

This report outlines the safety work carried out for the OpenAI o3-mini model, including safety evaluations, external red teaming, and Preparedness Framework evaluations.

Safety Evals

METR Blog January 17, 2025 08:00

AI models can be dangerous before public deployment

Why pre-deployment testing is not an adequate framework for AI risk management

Safety Evals Testing Tools

OpenAI Evaluation Filter December 05, 2024 10:00

OpenAI o1 System Card

This report outlines the safety work carried out prior to releasing OpenAI o1 and o1-mini, including external red teaming and frontier risk evaluations according to our Preparedness Framework.

Safety Evals

METR Blog September 08, 2024 18:00

ERROR: The request could not be satisfied

Suggestions for expanded guidance on capability elicitation and robust model safeguards in the U.S. AI Safety Institute’s draft document “Managing Misuse Risk for Dual-Use Foundation Models” (NIST AI 800-1).

Safety Evals

METR Blog June 02, 2024 18:00

ERROR: The request could not be satisfied

Comments on NIST’s draft document “AI Risk Management Framework: Generative AI Profile.”

Safety Evals