Skip to content

NameError: name 'EvaluationDataset' is not defined #1515

@bdytx5

Description

@bdytx5

[ ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug
A clear and concise description of what the bug is.

Traceback (most recent call last):                                                                                            
  File "/Users/brettyoung/Desktop/old_dsk/dev_24/tutorials/ragas_tutorial/data_gen.py", line 54, in <module>                  
    df = dataset.to_pandas()
  File "/Users/brettyoung/miniconda3/lib/python3.10/site-packages/ragas/dataset_schema.py", line 197, in to_pandas
    data = self._to_list()
  File "/Users/brettyoung/miniconda3/lib/python3.10/site-packages/ragas/testset/synthesizers/testset_schema.py", line 52, in _to_list
    eval_list = self.to_evaluation_dataset()._to_list()
  File "/Users/brettyoung/miniconda3/lib/python3.10/site-packages/ragas/testset/synthesizers/testset_schema.py", line 47, in to_evaluation_dataset
    return EvaluationDataset(
NameError: name 'EvaluationDataset' is not defined

Ragas version:

(base) brettyoung@MacBook-Air brain % pip show ragas 
Name: ragas
Version: 0.2.0
Summary: 
Home-page: 
Author: 
Author-email: 
License: 
Location: /Users/brettyoung/miniconda3/lib/python3.10/site-packages
Requires: appdirs, datasets, langchain, langchain-community, langchain-core, langchain-openai, nest-asyncio, numpy, openai, pydantic, pysbd, tiktoken
Required-by: 
(base) brettyoung@MacBook-Air brain % 

Python version:
Python 3.10.10

Code to Reproduce
Share code to reproduce the issue

import os
import nest_asyncio
import pandas as pd
from dotenv import load_dotenv
from langchain_community.document_loaders import DirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from ragas.llms import LangchainLLMWrapper
from langchain_openai import ChatOpenAI
from ragas.testset import TestsetGenerator
from ragas.dataset_schema import EvaluationDataset

# Apply nest_asyncio to avoid event loop issues
nest_asyncio.apply()

# Load OpenAI API key from environment variables or .env file
load_dotenv()  # Ensure you have a .env file with OPENAI_API_KEY
openai_api_key = os.getenv("OPENAI_API_KEY")

# Verify if the key was loaded correctly
if openai_api_key is None:
    raise ValueError("OpenAI API Key not found. Please ensure you have a .env file with 'OPENAI_API_KEY'.")

# Check if the Weave repository already exists; if not, download it using sparse checkout
repo_dir = "weave_docs"
if not os.path.exists(repo_dir):
    os.system(f"git init {repo_dir}")
    os.chdir(repo_dir)
    os.system("git remote add origin https://github.com/wandb/weave.git")
    os.system("git sparse-checkout init --cone")
    os.system("git sparse-checkout set docs/docs/guides/tracking")
    os.system("git pull origin master")
    os.chdir("..")
else:
    print(f"{repo_dir} already exists, skipping download.")

path = os.path.join(repo_dir, "docs/docs/guides/tracking")
loader = DirectoryLoader(path, glob="**/*.md")
docs = loader.load()

# Split the documents into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

# Wrap the LLM with LangchainLLMWrapper using OpenAI GPT-4 model
evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o"))

# Generate the test set with the loaded documents (generating 30 examples)
generator = TestsetGenerator(llm=evaluator_llm)

# Assuming the function signature doesn't accept `docs`, pass splits as positional argument
dataset = generator.generate_with_langchain_docs(splits, testset_size=30)

# Convert the generated dataset to a Pandas DataFrame
df = dataset.to_pandas()
print(df)

# Optionally, save the generated testset to a CSV file for further inspection
output_csv_path = "generated_testset.csv"
df.to_csv(output_csv_path, index=False)
print(f"Generated testset saved to {output_csv_path}")

Error trace

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
The problem can be fixed by making the following change

# if t.TYPE_CHECKING:
from ragas.dataset_schema import (
    EvaluationDataset,
    MultiTurnSample,
    SingleTurnSample,
)

` ``

-- eg comment the if t..... 


Not sure why this is happening but this fixes it 
<!-- PS: bugs suck but is also part of the process. We sincerely apologies for breaking your flow because of it, but don't worry, we got your back ❤️. We will get this fixed as fast as we can and thanks for helping us out by reporting it 🙏. -->

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions