evald.ai · METR Blog An update on our general capability evaluations August 06, 2024 17:00 More tasks, human baselines, and preliminary results for GPT-4 and Claude. Open original