evald.ai · OpenAI Evaluation Filter Introducing the SWE-Lancer benchmark February 18, 2025 10:00 Can frontier LLMs earn $1 million from real-world freelance software engineering? OpenAI Open original