evald.ai Sources

OpenAI Evaluation Filter

Using skills

Learn how to create and use ChatGPT skills to build reusable workflows, automate recurring tasks, and ensure consistent, high-quality outputs.

LLM Evaluation

LLM Evaluation ChatGPT

OpenAI Evaluation Filter

Inside our approach to the Model Spec

Learn how OpenAI’s Model Spec serves as a public framework for model behavior, balancing safety, user freedom, and accountability as AI systems advance.

OpenAI

OpenAI Evaluation Filter

Scaling AI for everyone

Today we’re announcing $110B in new investment at a $730B pre money valuation. This includes $30B from SoftBank, $30B from NVIDIA, and $50B from Amazon.

NVIDIA

OpenAI Evaluation Filter

Introducing EVMbench

OpenAI and Paradigm introduce EVMbench, a benchmark evaluating AI agents’ ability to detect, patch, and exploit high-severity smart contract vulnerabilities.

Benchmarks

Benchmarks OpenAI

OpenAI Evaluation Filter

Evaluating chain-of-thought monitorability

OpenAI introduces a new framework and evaluation suite for chain-of-thought monitorability, covering 13 evaluations across 24 environments. Our findings show that monitoring a model’s internal reasoning is far more effective than monitoring outputs alone,...

OpenAI