Gemini

Google News Eval Frameworks May 06, 2026 09:19

Claude Opus 4.7, Gemini 3.1 Pro, and Others Score 0% on New SWE Benchmark - Analytics India Magazine

Claude Opus 4.7, Gemini 3.1 Pro, and Others Score 0% on New SWE Benchmark Analytics India Magazine

Mitchell Bryson AI Reliability Articles February 27, 2026 00:00

Google DeepMind Unveils Nano Banana 2: Revolutionary AI Image Generator Democratizes Professional Creation - Mitchell Bryson

Google DeepMind launched Nano Banana 2 (Gemini 3.1 Flash Image), blending high-quality outputs with unprecedented speed to democratize professional-grade image creation across Google's product suite.

LLM Evaluation

LLM Evaluation Gemini Google Google DeepMind

Google News LLM Evaluation December 11, 2025 08:00

GPT-5.2 lands to top Google's Gemini 3 in the AI benchmark game just four weeks after GPT-5.1 - the-decoder.com

GPT-5.2 lands to top Google's Gemini 3 in the AI benchmark game just four weeks after GPT-5.1 the-decoder.com

Benchmarks

Benchmarks Gemini Google

Google News LLM Evaluation November 18, 2025 08:00

Revolutionary Google Gemini 3 Shatters Records with Unprecedented AI Benchmark Scores and Game-Changing Coding App - CryptoRank

Revolutionary Google Gemini 3 Shatters Records with Unprecedented AI Benchmark Scores and Game-Changing Coding App CryptoRank

Benchmarks

Benchmarks Gemini Google

METR Blog August 22, 2025 07:00

Claude, GPT, and Gemini All Struggle to Evade Monitors

Vincent Cheng and Thomas Kwa replicate a Google DeepMind paper on chain-of-thought monitoring, showing evidence that monitoring works on other companies' models.

Claude Gemini Google Google DeepMind