What is AI Evaluation (Evals)?

Glossary

AI Evaluation (Evals)

The practice of systematically measuring AI system quality.

Definition

AI evaluation is the practice of measuring whether an AI system produces correct, safe, and useful outputs against defined criteria. Evals range from automated tests (exact match, structured-output checks, LLM-as-judge) to human review and end-to-end production monitoring.

Context

Evals are the difference between a demo and a production AI system. Without them, you cannot tell whether a prompt change improved or degraded behavior, whether a model swap is safe, or whether the system is drifting in production. Strong evaluation pipelines are now table-stakes for any enterprise AI deployment.

Browse the full glossary

AI Evaluation (Evals)

Definition

Context

Related terms