Hacker News LLM Evaluation

Hacker News LLM Evaluation May 01, 2026 17:59

Build software better, together

GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.

Hacker News LLM Evaluation March 17, 2026 19:23

A Synthesis of LLM Evaluation | Arnab Roy

I have been reading a ton about LLM evaluation practices over the past few weeks from Anthropic’s engineering blog, Hamel Husain’s practitioner-focused guides, the Evals for AI Engineers book by Shreya Shankar and Hamel Husain, and several eval framework...

LLM Evaluation

Anthropic LLM Evaluation

Hacker News LLM Evaluation March 12, 2026 05:40

LLM Evaluation Metrics: The Ultimate LLM Evaluation Guide - Confident AI

In this article, I'll walkthrough everything you need to know about LLM evaluation metrics, with code samples.

LLM Evaluation

Hacker News LLM Evaluation February 19, 2026 13:18

LLM Evaluation Tool — Compare Models, Prompts & Configs | Valohai

Compare LLM models side by side with 3 lines of Python. Track evaluations across GPT, Claude, Llama and any model. Radar charts, scorecards, real-time streaming. Free forever.

LLM Evaluation

Claude LLM Evaluation

Hacker News LLM Evaluation February 10, 2026 04:43

Dharma_Code/paper/vocab_priming_confound.pdf at main · Palmerschallon/Dharma_Code

Polyglot ontological activations for LLM systems. 68 terms from 20+ traditions mapped to computational patterns, plus 10 algorithms native to the ontology that have no equivalents in standard CS. Includes benchmark suite and a documented evaluation...

Benchmarks

Hacker News LLM Evaluation December 27, 2025 11:37

GitHub - dokimos-dev/dokimos: Evaluation Framework for LLM applications in Java and Kotlin

Evaluation Framework for LLM applications in Java and Kotlin - dokimos-dev/dokimos

Testing Tools

Hacker News LLM Evaluation December 19, 2025 20:25

Building an LLM evaluation framework: best practices | Datadog

Explore best practices for building an evaluation framework for production LLM applications.

LLM Evaluation Testing Tools

Testing Tools LLM Evaluation

Hacker News LLM Evaluation December 16, 2025 13:28

GitHub - bassrehab/spark-llm-eval: Spark-native LLM evaluation framework with confidence intervals, significance testing, and Databricks integration

Spark-native LLM evaluation framework with confidence intervals, significance testing, and Databricks integration - bassrehab/spark-llm-eval

LLM Evaluation Testing Tools

Testing Tools LLM Evaluation

Hacker News LLM Evaluation December 04, 2025 17:48

GitHub - mburaksayici/smallevals: smallevals — CPU-fast, GPU-blazing fast offline retrieval evaluation for RAG systems with tiny QA models.

smallevals — CPU-fast, GPU-blazing fast offline retrieval evaluation for RAG systems with tiny QA models. - mburaksayici/smallevals

Hacker News LLM Evaluation December 04, 2025 02:30

Evaluation Guidebook - a Hugging Face Space by OpenEvals

This page automatically loads score data from several LLM leaderboards and shows an interactive chart that tracks how top benchmark results have changed. The chart groups benchmarks by category, hi...

Benchmarks

Hacker News LLM Evaluation November 24, 2025 14:19

LPFQA: A Long-Tail Professional Forum-based Benchmark for LLM Evaluation

Abstract page for arXiv paper 2511.06346: LPFQA: A Long-Tail Professional Forum-based Benchmark for LLM Evaluation

Benchmarks LLM Evaluation

Hacker News LLM Evaluation October 05, 2025 15:55

Understanding the 4 Main Approaches to LLM Evaluation (From Scratch)

Multiple-Choice Benchmarks, Verifiers, Leaderboards, and LLM Judges with Code Examples

Benchmarks LLM Evaluation