OpenAI Evaluation Filter

OpenAI Evaluation Filter May 12, 2026 00:00

AutoScout24 scales engineering with AI-powered workflows

Learn how AutoScout24 Group uses Codex and ChatGPT to speed development cycles, improve code quality, and expand AI adoption.

LLM Evaluation

LLM Evaluation ChatGPT

OpenAI Evaluation Filter May 11, 2026 10:00

How enterprises are scaling AI

How enterprises scale AI: from early experiments to compounding impact through trust, governance, workflow design, and quality at scale.

LLM Evaluation

OpenAI Evaluation Filter April 10, 2026 00:00

Creating images with ChatGPT

Learn how to create and refine images with ChatGPT using clear prompts, iterate on designs, and generate high-quality visuals in minutes.

LLM Evaluation

LLM Evaluation ChatGPT

OpenAI Evaluation Filter April 10, 2026 00:00

Using skills

Learn how to create and use ChatGPT skills to build reusable workflows, automate recurring tasks, and ensure consistent, high-quality outputs.

LLM Evaluation

LLM Evaluation ChatGPT

OpenAI Evaluation Filter April 09, 2026 00:00

CyberAgent moves faster with ChatGPT Enterprise and Codex

CyberAgent uses ChatGPT Enterprise and Codex to securely scale AI adoption, improve quality, and accelerate decisions across advertising, media, and gaming.

LLM Evaluation

LLM Evaluation ChatGPT

OpenAI Evaluation Filter March 25, 2026 10:00

Inside our approach to the Model Spec

Learn how OpenAI’s Model Spec serves as a public framework for model behavior, balancing safety, user freedom, and accountability as AI systems advance.

OpenAI

OpenAI Evaluation Filter March 06, 2026 00:00

How Balyasny Asset Management built an AI research engine

By combining rigorous model evaluation, full-platform use of OpenAI, and agent workflows, Balyasny is reinventing investment research.

LLM Evaluation

LLM Evaluation OpenAI

OpenAI Evaluation Filter February 27, 2026 05:30

Scaling AI for everyone

Today we’re announcing $110B in new investment at a $730B pre money valuation. This includes $30B from SoftBank, $30B from NVIDIA, and $50B from Amazon.

NVIDIA

OpenAI Evaluation Filter February 26, 2026 10:00

Pacific Northwest National Laboratory and OpenAI partner to accelerate federal permitting

OpenAI and Pacific Northwest National Laboratory introduce DraftNEPABench, a new benchmark evaluating how AI coding agents can accelerate federal permitting—showing potential to reduce NEPA drafting time by up to 15% and modernize infrastructure reviews.

Benchmarks

Benchmarks OpenAI

OpenAI Evaluation Filter February 18, 2026 00:00

Introducing EVMbench

OpenAI and Paradigm introduce EVMbench, a benchmark evaluating AI agents’ ability to detect, patch, and exploit high-severity smart contract vulnerabilities.

Benchmarks

Benchmarks OpenAI

OpenAI Evaluation Filter January 16, 2026 00:00

Our approach to advertising and expanding access to ChatGPT

OpenAI plans to test advertising in the U.S. for ChatGPT’s free and Go tiers to expand affordable access to AI worldwide, while protecting privacy, trust, and answer quality.

LLM Evaluation

LLM Evaluation OpenAI ChatGPT

OpenAI Evaluation Filter December 18, 2025 12:00

Evaluating chain-of-thought monitorability

OpenAI introduces a new framework and evaluation suite for chain-of-thought monitorability, covering 13 evaluations across 24 environments. Our findings show that monitoring a model’s internal reasoning is far more effective than monitoring outputs alone,...

OpenAI