evald.ai Entities

Mitchell Bryson AI Reliability Articles

The leak that repriced cybersecurity - Mitchell Bryson

Anthropic's accidental Mythos reveal crashed cybersecurity stocks — but the market was catching up to a reality that was already here. On the same day, CISA warned of active exploitation of AI agent frameworks, researchers disclosed basic vulnerabilities in...

Anthropic Mythos

Hacker News LLM Evaluation

A Synthesis of LLM Evaluation | Arnab Roy

I have been reading a ton about LLM evaluation practices over the past few weeks from Anthropic’s engineering blog, Hamel Husain’s practitioner-focused guides, the Evals for AI Engineers book by Shreya Shankar and Hamel Husain, and several eval framework...

LLM Evaluation

Anthropic LLM Evaluation