LLM Evaluation page 8

METR Blog March 17, 2023 15:22

Update on ARC's recent eval efforts

More information about ARC's evaluations of GPT-4 and Claude

LLM Evaluation

Claude LLM Evaluation

OpenAI Evaluation Filter June 17, 2020 07:00

Image GPT

We find that, just as a large transformer model trained on language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples. By establishing a correlation between sample quality and...

LLM Evaluation

OpenAI Evaluation Filter March 21, 2019 07:00

Implicit generation and generalization methods for energy-based models

We’ve made progress towards stable and scalable training of energy-based models (EBMs) resulting in better sample quality and generalization ability than existing models. Generation in EBMs spends more compute to continually refine its answers and doing so...

LLM Evaluation

OpenAI Evaluation Filter August 29, 2016 07:00

Infrastructure for deep learning

Deep learning is an empirical science, and the quality of a group’s infrastructure is a multiplier on progress. Fortunately, today’s open-source ecosystem makes it possible for anyone to build great deep learning infrastructure.

LLM Evaluation