Skip to content

Commit 48fe70e

Browse files
Prigoisticjjmachananistark
authored
fix: NameError during evalutation of llamaindex query engine (#2331)
### Issue Link / Problem Description - Fixes [#2330](#2330) - Evaluating a LlamaIndex query engine raised a runtime NameError: `EvaluationResult` not defined, because it was imported only under `t.TYPE_CHECKING`. Intermittent LlamaIndex execution failures also led to `IndexError` during result collection due to mismatched lengths. ### Changes Made - Import `EvaluationResult` at runtime from `ragas.dataset_schema` in `src/ragas/integrations/llama_index.py`. - Make response/context collection robust: - Handle failed executor jobs (NaN placeholders) by inserting empty response/context to maintain alignment with dataset size. - Prevent `IndexError` during dataset augmentation. - Light defensive checks to ensure stable evaluation even when some query-engine calls fail. ### Testing - Automated tests added/updated ### How to Test - Manual testing steps: 1. Install for local dev: `uv run pip install -e . -e ./examples` 2. Follow the LlamaIndex integration guide to set up a `query_engine` and `EvaluationDataset`: [docs](https://docs.ragas.io/en/stable/howtos/integrations/_llamaindex/) 3. Ensure LlamaIndex LLM is configured with `n=1` (or unset) to avoid “n values greater than 1 not support” warnings. 4. Run an evaluation that previously failed; it should complete without the `NameError` and without `IndexError` during result collection. 5. Optional: run lints `uv run ruff check .` ### References - Related issues: [#2330](#2330) - Documentation: LlamaIndex integration how-to ([link](https://docs.ragas.io/en/stable/howtos/integrations/_llamaindex/)) ### Screenshots/Examples (if applicable) - N/A --------- Co-authored-by: jjmachan <[email protected]> Co-authored-by: Ani <[email protected]>
1 parent 90eadca commit 48fe70e

File tree

1 file changed

+17
-7
lines changed

1 file changed

+17
-7
lines changed

src/ragas/integrations/llama_index.py

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
from __future__ import annotations
22

33
import logging
4+
import math
45
import typing as t
56

6-
from ragas.dataset_schema import EvaluationDataset, SingleTurnSample
7+
from ragas.dataset_schema import EvaluationDataset, EvaluationResult, SingleTurnSample
78
from ragas.embeddings import LlamaIndexEmbeddingsWrapper
89
from ragas.evaluation import evaluate as ragas_evaluate
910
from ragas.executor import Executor
@@ -18,10 +19,10 @@
1819
BaseEmbedding as LlamaIndexEmbeddings,
1920
)
2021
from llama_index.core.base.llms.base import BaseLLM as LlamaindexLLM
22+
from llama_index.core.base.response.schema import Response as LlamaIndexResponse
2123
from llama_index.core.workflow import Event
2224

2325
from ragas.cost import TokenUsageParser
24-
from ragas.evaluation import EvaluationResult
2526

2627

2728
logger = logging.getLogger(__name__)
@@ -78,12 +79,21 @@ def evaluate(
7879
exec.submit(query_engine.aquery, q, name=f"query-{i}")
7980

8081
# get responses and retrieved contexts
81-
responses: t.List[str] = []
82-
retrieved_contexts: t.List[t.List[str]] = []
82+
responses: t.List[t.Optional[str]] = []
83+
retrieved_contexts: t.List[t.Optional[t.List[str]]] = []
8384
results = exec.results()
84-
for r in results:
85-
responses.append(r.response)
86-
retrieved_contexts.append([n.node.text for n in r.source_nodes])
85+
for i, r in enumerate(results):
86+
# Handle failed jobs which are recorded as NaN in the executor
87+
if isinstance(r, float) and math.isnan(r):
88+
responses.append(None)
89+
retrieved_contexts.append(None)
90+
logger.warning(f"Query engine failed for query {i}: '{queries[i]}'")
91+
continue
92+
93+
# Cast to LlamaIndex Response type for proper type checking
94+
response: LlamaIndexResponse = t.cast("LlamaIndexResponse", r)
95+
responses.append(response.response if response.response is not None else "")
96+
retrieved_contexts.append([n.get_text() for n in response.source_nodes])
8797

8898
# append the extra information to the dataset
8999
for i, sample in enumerate(samples):

0 commit comments

Comments
 (0)