Evaluation framework
for evolving agents
Fully local with SQLite. Auto-capture LLM calls with one import. 130+ built-in metrics. GEPA-powered calibration. One command to run it all.
Fully Local
All data stays on your machine. SQLite storage with zero cloud dependencies.
130+ Metrics
Built-in metric bank with 73 objective and 60 LLM judge metrics, plus 17 domain bundles.
Auto Calibration
Align LLM judges with human feedback through automatic prompt optimization.
One Command
Run the entire evaluation pipeline with a single CLI invocation.
Agents without eval is unsustainable
Debugging in the dark
No insight into what's happening inside your agents. When something fails, you're left guessing.
Privacy concerns with cloud tools
Sending prompts and customer data to third-party platforms creates compliance risks and data exposure.
Manual testing doesn't scale
You can't spot-check every response. As agent complexity grows, quality assurance falls behind.
Evalyn changes this.
Full local tracing. 130+ automated metrics. LLM judges calibrated to your standards. One command to run it all.
Works with your stack
Native support for popular LLM frameworks and providers.
+ OpenTelemetry auto-instrumentation for any LLM provider
Metric bundles ready to evaluate specific agent types.
The Evaluation Pipeline
Auto-capture traces with simple decorators and OpenTelemetry integration. Zero config needed.
evalyn build-dataset --project myapp
Smart metric suggestion based on your code. Choose from 130+ built-in objective metrics and LLM judges.
evalyn run-eval --dataset ./data --metrics ./metrics.json
Human-in-the-loop feedback at both trace and span level. Build ground truth from real usage.
evalyn annotate --dataset ./data --per-metric
GEPA algorithm automatically optimizes LLM judge prompts to align with human preferences.
evalyn calibrate --metric-id safety --optimizer gepa
Generate synthetic datasets and run agent simulations at scale to expand evaluation coverage.
evalyn simulate --dataset ./data --target app:agent
Built for the community