I have been reading a ton about LLM evaluation practices over the past few weeks from Anthropic’s engineering blog, Hamel Husain’s practitioner-focused guides, the Evals for AI Engineers book by Shreya Shankar and Hamel Husain, and several eval framework docs. I wanted to write what I learned and have a synthesis of my understanding of the topic.