OpenAI Evaluation Filter

gpt-oss-safeguard technical report

gpt-oss-safeguard-120b and gpt-oss-safeguard-20b are two open-weight reasoning models post-trained from the gpt-oss models and trained to reason from a provided policy in order to label content under that policy. In this report, we describe...

OpenAI Evaluation Filter

Addendum to GPT-5 System Card: Sensitive conversations

This system card details GPT-5’s improvements in handling sensitive conversations, including new benchmarks for emotional reliance, mental health, and jailbreak resistance.

Google DeepMind Evaluation Filter

Google DeepMind strengthens the Frontier Safety Framework

Today, we’re publishing the third iteration of our Frontier Safety Framework (FSF) — our most comprehensive approach yet to identifying and mitigating severe risks from advanced AI models. This updat…

More stories

More stories load automatically as you scroll.