Compare LLM models side by side with 3 lines of Python. Track evaluations across GPT, Claude, Llama and any model. Radar charts, scorecards, real-time streaming. Free forever.