Gemma Scope 2: Helping the AI Safety Community Deepen Understanding of Complex Language Model Behavior
Announcing Gemma Scope 2, a comprehensive, open suite of interpretability tools for the entire Gemma 3 family to accelerate AI safety research.
Community feed
A focused stream of recent stories from the sources curated for this community. Latest: Gemma Scope 2: Helping the AI Safety Community Deepen Understanding of Complex Language Model Behavior, Evaluating AI’s ability to perform scientific research tasks, and Measuring AI’s capability to accelerate biological research in the wet lab. Page 10.
Announcing Gemma Scope 2, a comprehensive, open suite of interpretability tools for the entire Gemma 3 family to accelerate AI safety research.
OpenAI introduces FrontierScience, a benchmark testing AI reasoning in physics, chemistry, and biology to measure progress toward real scientific research.
OpenAI introduces a real-world evaluation framework to measure how AI can accelerate biological research in the wet lab. Using GPT-5 to optimize a molecular cloning protocol, the work explores both the promise and risks of AI-assisted experimentation.
GPT-5.2 is OpenAI’s strongest model yet for math and science, setting new state-of-the-art results on benchmarks like GPQA Diamond and FrontierMath. This post shows how those gains translate into real research progress, including solving an open theoretical...
GPT-5.2 lands to top Google's Gemini 3 in the AI benchmark game just four weeks after GPT-5.1 the-decoder.com
Google DeepMind and the UK AI Security Institute (AISI) strengthen collaboration through a new research partnership, focusing on critical safety research areas like monitoring AI reasoning and evalua…
The FACTS Benchmark Suite provides a systematic evaluation of Large Language Models (LLMs) factuality across three areas: Parametric, Search, and Multimodal reasoning.
New Benchmark Shows AI Chatbots Are Easily Manipulated Built In
Seekr Introduces SeekrGuard for AI Model Evaluation ExecutiveBiz
Shared components of AI lab commitments to evaluate and mitigate severe risks.
AI Benchmark for Materials Science Research anl.gov
Top 5 Open-Source LLM Evaluation Platforms KDnuggets
More stories load automatically as you scroll.