Safety Evals page 2

Google DeepMind Evaluation Filter December 16, 2025 10:14

Gemma Scope 2: Helping the AI Safety Community Deepen Understanding of Complex Language Model Behavior

Announcing Gemma Scope 2, a comprehensive, open suite of interpretability tools for the entire Gemma 3 family to accelerate AI safety research.

Safety Evals

Google DeepMind Evaluation Filter December 11, 2025 00:06

Deepening AI Safety Research with UK AI Security Institute (AISI)

Google DeepMind and the UK AI Security Institute (AISI) strengthen collaboration through a new research partnership, focusing on critical safety research areas like monitoring AI reasoning and evalua…

Safety Evals

Safety Evals Google Google DeepMind

METR Blog December 09, 2025 08:00

Common Elements of Frontier AI Safety Policies (December 2025 Update)

Shared components of AI lab commitments to evaluate and mitigate severe risks.

Safety Evals

Mitchell Bryson AI Reliability Articles November 02, 2025 00:00

Human Gates, Agent Throughput: An autonomy ladder that scales without losing control - Mitchell Bryson

A practical model for growing AI agent autonomy - levels, controls, and KPIs - grounded in risk frameworks (NIST AI RMF, ISO/IEC 42001) and aligned with EU AI Act oversight.

Safety Evals

METR Blog October 28, 2025 09:47

Review of the Anthropic Summer 2025 Pilot Sabotage Risk Report

External review from METR of Anthropic's Summer 2025 Sabotage Risk Report

Safety Evals

Anthropic Safety Evals

METR Blog October 23, 2025 07:00

Summary of our gpt-oss methodology review

Details on external recommendations from METR for gpt-oss Preparedness experiments and follow-up from OpenAI.

Safety Evals

Safety Evals OpenAI

OpenAI Evaluation Filter July 17, 2025 10:00

ChatGPT agent System Card

ChatGPT agent System Card: OpenAI’s agentic model unites research, browser automation, and code tools with safeguards under the Preparedness Framework.

Safety Evals

Safety Evals OpenAI ChatGPT

METR Blog June 27, 2025 07:00

What should companies share about risks from frontier AI models?

Current views on information relevant for visibility into frontier AI risk.

Safety Evals

OpenAI Evaluation Filter April 15, 2025 00:00

Our updated Preparedness Framework

Sharing our updated framework for measuring and protecting against severe harm from frontier AI capabilities.

Safety Evals

Google DeepMind Evaluation Filter April 02, 2025 13:31

Taking a responsible path to AGI

We’re exploring the frontiers of AGI, prioritizing technical safety, proactive risk assessment, and collaboration with the AI community.

Safety Evals

OpenAI Evaluation Filter February 25, 2025 10:00

Deep research System Card

This report outlines the safety work carried out prior to releasing deep research including external red teaming, frontier risk evaluations according to our Preparedness Framework, and an overview of the mitigations we built in to address key risk areas.

Safety Evals

Mitchell Bryson AI Reliability Articles February 17, 2025 00:00

EU/UK AI compliance in 2025: mapping the ICO risk toolkit to EU AI Act deadlines for product teams - Mitchell Bryson

What the EU AI Act requires in 2025–2027, how it lines up with the UK ICO's AI & Data Protection Risk Toolkit, and the exact outputs your team should ship.

Safety Evals