Open
Description
Description
When running evaluation with Google’s ADK ResponseEvaluator
, the default _get_score()
method simply does:
return eval_result.summary_metrics[f"{self._metric_name}/mean"].item()
This can sometimes return NaN
,
Steps to Reproduce
Run the evaluator, for example:
pytest solver.py::test_solver_integral_of_sin_x
Observe that the returned score is NaN
, and threshold-based assertions never pass—even if you set the threshold to 0.0
.
def test_solver_integral_of_sin_x() -> None:
dataset_path = Path(__file__).parent / "solver.test.json"
asyncio.run(
AgentEvaluator.evaluate(
agent_module="teacher_agent.sub_agents.solve_agent.solver",
eval_dataset_file_path_or_dir=str(dataset_path),
num_runs=1,
)
)
Expected Behavior
- If the metric is undefined (
NaN
), it should be treated as0.0
, so that a threshold of0.0
still passes. - No unexpected exceptions or silent failures.
Actual Behavior
_get_score()
returns aNaN
float.- Subsequent comparisons fail in non-intuitive ways (e.g.
NaN < 0.0
is false). - Tests or pipelines that expect zero-threshold passing do not.
A Quick Fix
Override the evaluator’s scoring method to coerce NaN
→ 0.0
.
import math
from google.adk.evaluation.response_evaluator import ResponseEvaluator
def _safe_get_score(self, eval_result):
val = eval_result.summary_metrics.get(f"{self._metric_name}/mean")
# unwrap tensor or fallback to 0.0
score = val.item() if hasattr(val, "item") else float(val or 0.0)
# treat NaN as zero
if isinstance(score, float) and math.isnan(score):
score = 0.0
return score
ResponseEvaluator._get_score = _safe_get_score