Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added save and load to RagasDataset #1492

Merged
merged 6 commits into from
Oct 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -168,5 +168,4 @@ cython_debug/
experiments/
**/fil-result/
src/ragas/_version.py
.vscode
/docs/references/
.vscode
1 change: 0 additions & 1 deletion .readthedocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,4 @@ build:
commands:
- pip install -e .[docs]
- if [ -n "$GH_TOKEN" ]; then pip install git+https://${GH_TOKEN}@github.com/squidfunk/mkdocs-material-insiders.git; fi
- python scripts/gen_ref_pages.py
- mkdocs build --site-dir $READTHEDOCS_OUTPUT/html
2 changes: 0 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,6 @@ run-ci: format lint type test ## Running all CI checks

# Docs
docsite: ## Build and serve documentation
@echo "Generating reference pages..."
@python scripts/gen_ref_pages.py
@mkdocs serve --dirty
rewrite-docs: ## Use GPT4 to rewrite the documentation
@echo "Rewriting the documentation in directory $(DIR)..."
Expand Down
15 changes: 3 additions & 12 deletions docs/getstarted/rag_evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,18 +14,9 @@ dataset = load_dataset("explodinggradients/amnesty_qa","english_v3")
Converting data to ragas [evaluation dataset](../concepts/components/eval_dataset.md)

```python
from ragas import EvaluationDataset, SingleTurnSample

samples = []
for row in dataset['eval']:
sample = SingleTurnSample(
user_input=row['user_input'],
reference=row['reference'],
response=row['response'],
retrieved_contexts=row['retrieved_contexts']
)
samples.append(sample)
eval_dataset = EvaluationDataset(samples=samples)
from ragas import EvaluationDataset

eval_dataset = EvaluationDataset.from_hf_dataset(dataset["eval"])
```


Expand Down
87 changes: 87 additions & 0 deletions docs/howtos/migrations/migrate_from_v01_to_v02.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Migration from v0.1 to v0.2

v0.2 is the start of the transition for Ragas from an evaluation library for RAG pipelines to a more general library that you can use to evaluate any LLM applications you build. The meant we had to make some fundamental changes to the library that will break your workflow. Hopeful this guide will make that transition as easy as possible.

## Outline

1. Evaluation Dataset
2. Metrics
3. Testset Generation
4. Prompt Object

## Evaluation Dataset

We have moved from using HuggingFace [`Datasets`](https://huggingface.co/docs/datasets/v3.0.1/en/package_reference/main_classes#datasets.Dataset) to our own [`EvaluationDataset`][ragas.dataset_schema.EvaluationDataset] . You can read more about it from the core concepts section for [EvaluationDataset](../../concepts/components/evaluation-dataset.md) and [EvaluationSample](../../concepts/components/eval_sample.md)

You can easily translate

```python
from ragas import EvaluationDataset, SingleTurnSample

hf_dataset = ... # your huggingface evaluation dataset
eval_dataset = EvaluationDataset.from_hf_dataset(hf_dataset)

# save eval dataset
eval_dataset.to_csv("path/to/save/dataset.csv")

# load eva dataset
eval_dataset = EvaluationDataset.from_csv("path/to/save/dataset.csv")
```

## Metrics

All the default metrics are still supported and many new metrics have been added. Take a look at the [documentation page](../../concepts/metrics/available_metrics/index.md) for the entire list.

How ever there are a couple of changes in how you use metrics

Firstly it is now preferred to initialize metrics with the evaluator LLM of your choice as oppose to using the initialized version of the metrics into [`evaluate()`][ragas.evaluation.evaluate] . This avoids a lot of confusion regarding which LLMs are used where.

```python
from ragas.metrics import faithfullness # old way, not recommended but still supported till v0.3
from ragas.metrics import Faithfulness

# preffered way
faithfulness_metric = Faithfulness(llm=your_evaluator_llm)
```
Second is that [`metrics.ascore`][ragas.metrics.base.Metric.ascore] is now being deprecated in favor of [`metrics.single_score`][ragas.metrics.base.SingleTurnMetric.single_turn_ascore] . You can make the transition as such

```python
# create a Single Turn Sample
from ragas import SingleTurnSample
sample = SingleTurnSample(
user_input="user query",
response="response from your pipeline"
)

# Init the metric
from ragas.metrics import Faithfulness
faithfulness_metric = Faithfulness(llm=your_evaluator_llm)
score = faithfulness.sigle_turn_ascore(sample=sample)
print(score)
# 0.9
```

## Testset Generation

[Testset Generation](../../concepts/test_data_generation/rag.md) has been redesigned to be much more cost efficient. If you were using the end-to-end workflow checkout the [getting started](../../getstarted/rag_testset_generation.md).

**Notable Changes**

- Removed `Docstore` in favor of a new `Knowledge Graph`
- Added `Transforms` which will convert the documents passed into a rich knowledge graph
- More customizable with `Synthesizer` objects. Also refer to the documentation.
- New workflow makes it much cheaper and intermediate states can be saved easily

This might be a bit rough but if you do need help here, feel free to chat or mention it here and we would love to help you out 🙂

## Prompt Object

All the prompts have been rewritten to use [`PydanticPrompts`][ragas.prompt.pydantic_prompt.PydanticPrompt] which is based on [`BasePrompt`][ragas.prompt.base.BasePrompt] object. If you are using the old `Prompt` object you will have to upgrade it to the new one, check the docs to learn more on how to do it

- [How to Guide on how to create new prompts](../../howtos/customizations/metrics/modifying-prompts-metrics.md)
- [Github PR for the changes](https://github.com/explodinggradients/ragas/pull/1462)

!!! note "Need Further Assistance?"

If you have any further questions feel free to post them in this [github issue](https://github.com/explodinggradients/ragas/issues/1486) or reach out to us on [cal.com](https://cal.com/shahul-ragas/30min)

2 changes: 2 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,8 @@ nav:
- howtos/applications/index.md
- Integrations:
- howtos/integrations/index.md
- Migrations:
- From v0.1 to v0.2: howtos/migrations/migrate_from_v01_to_v02.md
- 📖 References:
- Core:
- Prompt: references/prompt.md
Expand Down
43 changes: 0 additions & 43 deletions scripts/gen_ref_pages.py

This file was deleted.

122 changes: 83 additions & 39 deletions src/ragas/dataset_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
from pandas import DataFrame as PandasDataframe


class BaseEvalSample(BaseModel):
class BaseSample(BaseModel):
"""
Base class for evaluation samples.
"""
Expand All @@ -30,7 +30,7 @@ def get_features(self) -> t.List[str]:
return list(self.to_dict().keys())


class SingleTurnSample(BaseEvalSample):
class SingleTurnSample(BaseSample):
"""
Represents evaluation samples for single-turn interactions.

Expand Down Expand Up @@ -61,7 +61,7 @@ class SingleTurnSample(BaseEvalSample):
rubric: t.Optional[t.Dict[str, str]] = None


class MultiTurnSample(BaseEvalSample):
class MultiTurnSample(BaseSample):
"""
Represents evaluation samples for multi-turn interactions.

Expand Down Expand Up @@ -127,44 +127,14 @@ def pretty_repr(self):
return "\n".join(lines)


class EvaluationDataset(BaseModel):
"""
Represents a dataset of evaluation samples.
Sample = t.TypeVar("Sample", bound=BaseSample)

Parameters
----------
samples : List[BaseEvalSample]
A list of evaluation samples.

Attributes
----------
samples : List[BaseEvalSample]
A list of evaluation samples.

Methods
-------
validate_samples(samples)
Validates that all samples are of the same type.
get_sample_type()
Returns the type of the samples in the dataset.
to_hf_dataset()
Converts the dataset to a Hugging Face Dataset.
to_pandas()
Converts the dataset to a pandas DataFrame.
features()
Returns the features of the samples.
from_list(mapping)
Creates an EvaluationDataset from a list of dictionaries.
from_dict(mapping)
Creates an EvaluationDataset from a dictionary.
"""

samples: t.List[BaseEvalSample]
class RagasDataset(BaseModel, t.Generic[Sample]):
samples: t.List[Sample]

@field_validator("samples")
def validate_samples(
cls, samples: t.List[BaseEvalSample]
) -> t.List[BaseEvalSample]:
def validate_samples(cls, samples: t.List[BaseSample]) -> t.List[BaseSample]:
"""Validates that all samples are of the same type."""
if len(samples) == 0:
return samples
Expand Down Expand Up @@ -202,6 +172,11 @@ def to_hf_dataset(self) -> HFDataset:

return HFDataset.from_list(self._to_list())

@classmethod
def from_hf_dataset(cls, dataset: HFDataset) -> "RagasDataset[Sample]":
"""Creates an EvaluationDataset from a Hugging Face Dataset."""
return cls.from_list(dataset.to_list())

def to_pandas(self) -> PandasDataframe:
"""Converts the dataset to a pandas DataFrame."""
try:
Expand Down Expand Up @@ -244,11 +219,80 @@ def from_dict(cls, mapping: t.Dict):
samples.extend(SingleTurnSample(**sample) for sample in mapping)
return cls(samples=samples)

def __iter__(self) -> t.Iterator[BaseEvalSample]: # type: ignore
@classmethod
def from_csv(cls, path: str):
"""Creates an EvaluationDataset from a CSV file."""
import csv

with open(path, "r", newline="") as csvfile:
reader = csv.DictReader(csvfile)
data = [row for row in reader]
return cls.from_list(data)

def to_csv(self, path: str):
"""Converts the dataset to a CSV file."""
import csv

data = self._to_list()
if not data:
return

fieldnames = self.features()

with open(path, "w", newline="") as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for row in data:
writer.writerow(row)

def to_jsonl(self, path: str):
"""Converts the dataset to a JSONL file."""
with open(path, "w") as jsonlfile:
for sample in self.samples:
jsonlfile.write(json.dumps(sample.to_dict()) + "\n")

@classmethod
def from_jsonl(cls, path: str):
"""Creates an EvaluationDataset from a JSONL file."""
with open(path, "r") as jsonlfile:
data = [json.loads(line) for line in jsonlfile]
return cls.from_list(data)

def __iter__(self) -> t.Iterator[Sample]: # type: ignore
return iter(self.samples)

def __len__(self) -> int:
return len(self.samples)

def __getitem__(self, idx: int) -> BaseEvalSample:
def __getitem__(self, idx: int) -> Sample:
return self.samples[idx]


class EvaluationDataset(RagasDataset[BaseSample]):
"""
Represents a dataset of evaluation samples.

Attributes
----------
samples : List[BaseSample]
A list of evaluation samples.

Methods
-------
validate_samples(samples)
Validates that all samples are of the same type.
get_sample_type()
Returns the type of the samples in the dataset.
to_hf_dataset()
Converts the dataset to a Hugging Face Dataset.
to_pandas()
Converts the dataset to a pandas DataFrame.
features()
Returns the features of the samples.
from_list(mapping)
Creates an EvaluationDataset from a list of dictionaries.
from_dict(mapping)
Creates an EvaluationDataset from a dictionary.
"""

pass
4 changes: 3 additions & 1 deletion src/ragas/llms/prompt.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,9 @@ def format(self, **kwargs: t.Any) -> PromptValue:
)
for key, value in kwargs.items():
if isinstance(value, str):
kwargs[key] = json.dumps(value, ensure_ascii=False).encode("utf8").decode()
kwargs[key] = (
json.dumps(value, ensure_ascii=False).encode("utf8").decode()
)

prompt = self.to_string()
return PromptValue(prompt_str=prompt.format(**kwargs))
Expand Down
Loading
Loading