Explore Stax, an experimental developer tool that streamlines LLM evaluation with human labelling and scalable LLM-as-a-judge auto-raters for data driven decisions.