evald.ai Sources

OpenAI Evaluation Filter

CLIP: Connecting text and images

We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision. CLIP can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized,...

Benchmarks

Benchmarks

OpenAI Evaluation Filter

Procgen and MineRL Competitions

We’re excited to announce that OpenAI is co-organizing two NeurIPS 2020 competitions with AIcrowd, Carnegie Mellon University, and DeepMind, using Procgen Benchmark and MineRL.

Benchmarks

Benchmarks OpenAI

OpenAI Evaluation Filter

Image GPT

We find that, just as a large transformer model trained on language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples. By establishing a correlation between sample quality and...

LLM Evaluation

LLM Evaluation

OpenAI Evaluation Filter

Procgen Benchmark

We’re releasing Procgen Benchmark, 16 simple-to-use procedurally-generated environments which provide a direct measure of how quickly a reinforcement learning agent learns generalizable skills.

Benchmarks

Benchmarks

OpenAI Evaluation Filter

Better language models and their implications

We’ve trained a large-scale unsupervised language model which generates coherent paragraphs of text, achieves state-of-the-art performance on many language modeling benchmarks, and performs rudimentary reading comprehension, machine translation, question...

Benchmarks

Benchmarks

OpenAI Evaluation Filter

OpenAI Five Benchmark: Results

Yesterday, OpenAI Five won a best-of-three against a team of 99.95th percentile Dota players: Blitz, Cap, Fogged, Merlini, and MoonMeander—four of whom have played Dota professionally—in front of a live audience and 100,000 concurrent livestream viewers.

Benchmarks

Benchmarks OpenAI

OpenAI Evaluation Filter

Gotta Learn Fast: A new benchmark for generalization in RL

In this report, we present a new reinforcement learning (RL) benchmark based on the Sonic the Hedgehog™ video game franchise. This benchmark is intended to measure the performance of transfer learning and few-shot learning algorithms in the RL domain. We...

Benchmarks

Benchmarks

OpenAI Evaluation Filter

Infrastructure for deep learning

Deep learning is an empirical science, and the quality of a group’s infrastructure is a multiplier on progress. Fortunately, today’s open-source ecosystem makes it possible for anyone to build great deep learning infrastructure.

LLM Evaluation

LLM Evaluation