Stop “vibe testing” your LLMs. It's time for real evals.- Google Developers Blog
Explore Stax, an experimental developer tool that streamlines LLM evaluation with human labelling and scalable LLM-as-a-judge auto-raters for data driven decisions.