`ResponseEvaluator._get_score` returns `NaN` and breaks threshold checks

## Description

When running evaluation with Google’s ADK `ResponseEvaluator`, the default `_get_score()` method simply does:

```python
return eval_result.summary_metrics[f"{self._metric_name}/mean"].item()
```

This can sometimes return `NaN`,

---

## Steps to Reproduce

Run the evaluator, for example:
   ```bash
   pytest solver.py::test_solver_integral_of_sin_x
   ```
Observe that the returned score is `NaN`, and threshold-based assertions never pass—even if you set the threshold to `0.0`.  

```python
def test_solver_integral_of_sin_x() -> None:

    dataset_path = Path(__file__).parent / "solver.test.json"

    asyncio.run(
        AgentEvaluator.evaluate(
            agent_module="teacher_agent.sub_agents.solve_agent.solver",
            eval_dataset_file_path_or_dir=str(dataset_path),
            num_runs=1,
        )
    )

```

---

## Expected Behavior

- If the metric is undefined (`NaN`), it should be treated as `0.0`, so that a threshold of `0.0` still passes.  
- No unexpected exceptions or silent failures.

---

## Actual Behavior

- `_get_score()` returns a `NaN` float.  
- Subsequent comparisons fail in non-intuitive ways (e.g. `NaN < 0.0` is false).  
- Tests or pipelines that expect zero-threshold passing do not.

---

## A Quick Fix

Override the evaluator’s scoring method to coerce `NaN` → `0.0`.

```python
import math
from google.adk.evaluation.response_evaluator import ResponseEvaluator

def _safe_get_score(self, eval_result):
    val = eval_result.summary_metrics.get(f"{self._metric_name}/mean")
    # unwrap tensor or fallback to 0.0
    score = val.item() if hasattr(val, "item") else float(val or 0.0)
    # treat NaN as zero
    if isinstance(score, float) and math.isnan(score):
        score = 0.0
    return score


ResponseEvaluator._get_score = _safe_get_score
```




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`ResponseEvaluator._get_score` returns `NaN` and breaks threshold checks #1281

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

A Quick Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ResponseEvaluator._get_score returns NaN and breaks threshold checks #1281

Description

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

A Quick Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`ResponseEvaluator._get_score` returns `NaN` and breaks threshold checks #1281