OpenAI page 3

OpenAI Evaluation Filter September 05, 2025 10:00

Why language models hallucinate

OpenAI’s new research explains why language models hallucinate. The findings show how improved evaluations can enhance AI reliability, honesty, and safety.

Testing Tools

Testing Tools OpenAI

OpenAI Evaluation Filter August 27, 2025 10:00

Findings from a pilot Anthropic–OpenAI alignment evaluation exercise: OpenAI Safety Tests

OpenAI and Anthropic share findings from a first-of-its-kind joint safety evaluation, testing each other’s models for misalignment, instruction following, hallucinations, jailbreaking, and more—highlighting progress, challenges, and the value of cross-lab...

Testing Tools

Anthropic Testing Tools OpenAI

METR Blog August 07, 2025 07:00

Details about METR's evaluation of OpenAI GPT-5

We evaluate whether GPT-5 poses significant catastrophic risks via AI self-improvement, rogue replication, or sabotage of AI labs. We conclude that this seems unlikely. However, capability trends continue rapidly, and models display increasing eval awareness.

LLM Evaluation

LLM Evaluation OpenAI

OpenAI Evaluation Filter July 17, 2025 10:00

ChatGPT agent System Card

ChatGPT agent System Card: OpenAI’s agentic model unites research, browser automation, and code tools with safeguards under the Preparedness Framework.

Safety Evals

Safety Evals OpenAI ChatGPT

METR Blog April 16, 2025 07:00

Details about METR's preliminary evaluation of OpenAI's o3 and o4-mini

METR conducted a preliminary evaluation of OpenAI's o3 and o4-mini. The two models displayed higher autonomous capabilities than other public models tested, and o3 appears somewhat prone to "reward hacking".

OpenAI

OpenAI Evaluation Filter January 31, 2025 11:00

OpenAI o3-mini System Card

This report outlines the safety work carried out for the OpenAI o3-mini model, including safety evaluations, external red teaming, and Preparedness Framework evaluations.

Safety Evals

Safety Evals OpenAI

OpenAI Evaluation Filter January 23, 2025 10:00

Operator System Card

Drawing from OpenAI’s established safety frameworks, this document highlights our multi-layered approach, including model and product mitigations we’ve implemented to protect against prompt engineering and jailbreaks, protect privacy and security, as well...

OpenAI

OpenAI Evaluation Filter December 05, 2024 10:00

OpenAI o1 System Card

This report outlines the safety work carried out prior to releasing OpenAI o1 and o1-mini, including external red teaming and frontier risk evaluations according to our Preparedness Framework.

Safety Evals

Safety Evals OpenAI

METR Blog November 22, 2024 08:00

Evaluating frontier AI R&D capabilities of language model agents against human experts

We’re releasing RE-Bench, a new benchmark for measuring the performance of humans and frontier model agents on ML research engineering tasks. We also share data from 71 human expert attempts and results for Anthropic’s Claude 3.5 Sonnet and OpenAI’s...

Benchmarks

Anthropic Claude Benchmarks OpenAI

METR Blog September 12, 2024 17:00

Details about METR's preliminary evaluation of OpenAI o1-preview

We measured the performance of OpenAI's o1-mini and o1-preview models on our autonomy and AI R&D task suites, and found they did not exceed the capabilities of the best existing public model we've evaluated, though we could not confidently upper-bound...

OpenAI

OpenAI Evaluation Filter July 10, 2024 06:30

OpenAI and Los Alamos National Laboratory announce research partnership

OpenAI and Los Alamos National Laboratory are working to develop safety evaluations to assess and measure biological capabilities and risks associated with frontier models.

OpenAI

OpenAI Evaluation Filter March 14, 2023 07:00

GPT-4

We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits...

Benchmarks

Benchmarks OpenAI