Skip to content

[RFC] Executor: making Ragas faster and more reliable #394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jjmachan opened this issue Dec 19, 2023 · 2 comments
Closed

[RFC] Executor: making Ragas faster and more reliable #394

jjmachan opened this issue Dec 19, 2023 · 2 comments

Comments

@jjmachan
Copy link
Member

Problem - ragas is slow and unreliable

  1. Ragas is not exploiting concurrency options provided via ThreadPoolExecutor and asyncio modules. This is because ragas took a batching approach to evaluation ie evaluated metrics in batches
  2. Not every service has async support - need options to keep sync and no concurrency at all
  3. need these primitives for [RFC] Testset Generation: making it faster and easy to use #380 and potentially others as well

Core Components

  1. BaseMetric - a metric that evaluates a single score row with but score() and ascore()
  2. RagasLLM that is based on langchain-core llms
    1. Prompt object with provision for instruction and demonstrations that convert to messages or prompts that is supported by both langchain chat based on completion based
    2. LLMResult object that supports both chat and text-based outputs
  3. Exector that runs BaseMetric . It should also be able to run testset generators so this should be a common paradigm
  4. new evaluate() function that makes it easier to
    1. change llm and embeddings - this is the new method where BaseMetrc by default will have llm=None and will take the default llm from the evaluate() function. If metric.llm != None then the provided metric is used
    2. switch between async vs threading
    3. supports callbacks throughout

Base classes

Metric

class Metric:
    def score(
      row, # just 1 row
      callbacks: t.Optional[Callbacks] = None,
    )-> float:
		    ...
    async def ascore(
        row, # just 1 row
        callbacks: t.Optional[Callbacks] = None,
    )-> float:
        ...

evaluation()

def evaluate(
    dataset: Dataset,
    metrics: list[Metric] | None = None,
    llm: t.Optional[BaseRagasLLM] = None,
    embeddings: t.Optional[RagasEmbeddings] = None,
    callbacks: Callbacks = [],
    is_async: bool = True,
    max_workers: t.Optional[int] = None,
    raise_exceptions: bool = True,
    column_map: t.Dict[str, str] = {},
) -> Result:

BaseRagasLLM

@dataclass
class BaseRagasLLM(ABC):
    @abstractmethod
    def generate_text(
        self,
        prompt: Prompt,
        n: int = 1,
        temperature: float = 1e-8,
        stop: t.Optional[t.List[str]] = None,
        callbacks: t.Optional[Callbacks] = None,
    ) -> LLMResult:
        ...

    @abstractmethod
    async def agenerate_text(
        self,
        prompt: Prompt,
        n: int = 1,
        temperature: float = 1e-8,
        stop: t.Optional[t.List[str]] = None,
        callbacks: t.Optional[Callbacks] = None,
    ) -> LLMResult:
        ...
@iterakhtaras
Copy link

Hey @jjmachan ! Thanks for all your work on ragas, I really appreciate it. I am trying to use it to evaluate my chatbot created with llama-index. Has there been any workarounds discovered for issue #271 ?

These are my dependencies:
`%pip install ragas==0.0.22

%pip install pypdf

%pip install llama-index==0.8.52

%pip install langchain==0.0.331rc3

%pip install openai==0.28.1`

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 19, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 1, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants