Level 2 · Intermediate · 6 min

Evals

Automated tests for agents and prompts. Run on a fixed dataset, score the output, catch regressions before users do.