14 lines (10 loc) · 546 Bytes

Evaluation

Measure and improve the quality of the AI-Q blueprint.

To create custom evaluators or benchmarks, refer to the [NeMo Agent Toolkit Evaluation documentation](https://docs.nvidia.com/nemo/agent-toolkit/latest/improve-workflows/evaluate.html). The benchmarks below are pre-built for AI-Q.

Benchmarks — Run standardized evaluation suites