Skip to content

ResponseEvaluator._get_score returns NaN and breaks threshold checks #1281

Open
@Johnnyallen07

Description

@Johnnyallen07

Description

When running evaluation with Google’s ADK ResponseEvaluator, the default _get_score() method simply does:

return eval_result.summary_metrics[f"{self._metric_name}/mean"].item()

This can sometimes return NaN,


Steps to Reproduce

Run the evaluator, for example:

pytest solver.py::test_solver_integral_of_sin_x

Observe that the returned score is NaN, and threshold-based assertions never pass—even if you set the threshold to 0.0.

def test_solver_integral_of_sin_x() -> None:

    dataset_path = Path(__file__).parent / "solver.test.json"

    asyncio.run(
        AgentEvaluator.evaluate(
            agent_module="teacher_agent.sub_agents.solve_agent.solver",
            eval_dataset_file_path_or_dir=str(dataset_path),
            num_runs=1,
        )
    )

Expected Behavior

  • If the metric is undefined (NaN), it should be treated as 0.0, so that a threshold of 0.0 still passes.
  • No unexpected exceptions or silent failures.

Actual Behavior

  • _get_score() returns a NaN float.
  • Subsequent comparisons fail in non-intuitive ways (e.g. NaN < 0.0 is false).
  • Tests or pipelines that expect zero-threshold passing do not.

A Quick Fix

Override the evaluator’s scoring method to coerce NaN0.0.

import math
from google.adk.evaluation.response_evaluator import ResponseEvaluator

def _safe_get_score(self, eval_result):
    val = eval_result.summary_metrics.get(f"{self._metric_name}/mean")
    # unwrap tensor or fallback to 0.0
    score = val.item() if hasattr(val, "item") else float(val or 0.0)
    # treat NaN as zero
    if isinstance(score, float) and math.isnan(score):
        score = 0.0
    return score


ResponseEvaluator._get_score = _safe_get_score

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions