OpenAI

Mitchell Bryson AI Reliability Articles May 12, 2026 00:00

The attack that wrote itself - Mitchell Bryson

Analysis of Google's interception of an AI-generated zero-day exploit and what divergent responses from OpenAI, Anthropic, and Microsoft mean for builders.

Anthropic OpenAI Google Microsoft

Mitchell Bryson AI Reliability Articles March 30, 2026 00:00

OpenAI just showed everyone what an AI company looks like when the math stops working - Mitchell Bryson

In a single week, OpenAI killed Sora ($15M/day burn, $2.1M lifetime revenue), blindsided Disney on a $1B deal, shelved its adult chatbot, renamed its product org to 'AGI Deployment,' moved safety oversight away from the CEO, and bet everything on a model...

OpenAI

Mitchell Bryson AI Reliability Articles March 29, 2026 00:00

The real AI war is over your memory - Mitchell Bryson

Google launched tools to import your ChatGPT memories and chat histories. Apple is turning Siri into a marketplace where every AI assistant plugs in — for a 30% cut. OpenAI is wiring Codex into every work tool you touch. Shopify made every AI conversation a...

LLM Evaluation

LLM Evaluation OpenAI Google ChatGPT

OpenAI Evaluation Filter March 25, 2026 10:00

Inside our approach to the Model Spec

Learn how OpenAI’s Model Spec serves as a public framework for model behavior, balancing safety, user freedom, and accountability as AI systems advance.

OpenAI

Mitchell Bryson AI Reliability Articles March 24, 2026 00:00

The land grab has gone financial - Mitchell Bryson

OpenAI's 17.5% guaranteed-return PE pitch, its 450,000 sq ft campus lease, and the Helion fusion deal all point to the same shift: the AI race is no longer about who has the best model — it's about who can lock in distribution, real estate, and energy...

Benchmarks

Anthropic Benchmarks OpenAI

Mitchell Bryson AI Reliability Articles March 10, 2026 00:00

AI finally learns to secure the code it writes - Mitchell Bryson

OpenAI shipping Codex Security, Anthropic's Claude finding 22 CVEs in Firefox in two weeks, and Microsoft treating AI agents as governed security principals all point to the same inflection: the industry is racing to close the security gap that AI coding...

Anthropic Claude OpenAI Microsoft

OpenAI Evaluation Filter March 06, 2026 00:00

How Balyasny Asset Management built an AI research engine

By combining rigorous model evaluation, full-platform use of OpenAI, and agent workflows, Balyasny is reinventing investment research.

LLM Evaluation

LLM Evaluation OpenAI

Mitchell Bryson AI Reliability Articles March 03, 2026 00:00

The Pentagon values auction: AI safety gets its market test - Mitchell Bryson

OpenAI amends its Pentagon deal after Altman admits it looked 'opportunistic and sloppy', while Claude surges to number one on the App Store and hundreds of employees publicly back Anthropic's stance.

Safety Evals

Anthropic Safety Evals Claude OpenAI

OpenAI Evaluation Filter February 26, 2026 10:00

Pacific Northwest National Laboratory and OpenAI partner to accelerate federal permitting

OpenAI and Pacific Northwest National Laboratory introduce DraftNEPABench, a new benchmark evaluating how AI coding agents can accelerate federal permitting—showing potential to reduce NEPA drafting time by up to 15% and modernize infrastructure reviews.

Benchmarks

Benchmarks OpenAI

Google News LLM Evaluation February 18, 2026 19:42

OpenAI Unveils AI Benchmark Tool to Enhance Blockchain Security - thedefiant.io

OpenAI Unveils AI Benchmark Tool to Enhance Blockchain Security thedefiant.io

Benchmarks

Benchmarks OpenAI

OpenAI Evaluation Filter February 18, 2026 00:00

Introducing EVMbench

OpenAI and Paradigm introduce EVMbench, a benchmark evaluating AI agents’ ability to detect, patch, and exploit high-severity smart contract vulnerabilities.

Benchmarks

Benchmarks OpenAI

OpenAI Evaluation Filter January 16, 2026 00:00

Our approach to advertising and expanding access to ChatGPT

OpenAI plans to test advertising in the U.S. for ChatGPT’s free and Go tiers to expand affordable access to AI worldwide, while protecting privacy, trust, and answer quality.

LLM Evaluation

LLM Evaluation OpenAI ChatGPT