add embeddings to TestsetGenerator #1562

hunter-walden2113 · 2024-10-23T14:32:59Z

Addresses the "no embeddings found" and "API Connection error" issues.
Specifically issues: 1546, 1526, 1512, 1496

Users have reported that they cannot generate a Testset because they get API connection errors, or their knowledge graph does not have the embeddings. This is due to the use of the default LLMs and Embedding models via llm_factory and embedding_factory. The errors are occuring becuase the users do not have OpenAI credentials in their environment because they are using different models in their workflow.

Issue to solve is to prevent the default_transforms function from using the llm_factory by forcing the user to add both an embedding model and llm model when instantiating TestsetGenerator.

Added embedding_model as an attribute to TestsetGenerator.
Added embedding_model: LangchainEmbeddings as a parameter to TestsetGenerator.from_langchain
Changed the return from TestsetGenerator.from_langchain to return cls(LangchainLLMWrapper(llm), LangchainEmbeddingsWrapper(embedding_model), knowledge_graph)
Added both an llm and embedding_model parameter to TestsetGenerator.generate_with_langchain_docs

add embeddings to TestsetGenerator

jjmachan · 2024-10-23T15:00:43Z

src/ragas/testset/synthesizers/generate.py

    """

    llm: BaseRagasLLM
+    embedding_model: BaseRagasEmbeddings


everything looks good but just ones point here. The TestsetGenerator object does not need an embedding model since it does not use it why synthesizing the testset (embeddings are only need for transforms).

I love make the llm and embedding model and forcing the user there - will help solve a lot of the issue

Thanks a lot for making this contribution 🙂 ❤️, let me know what you think about the point I made

Ok thanks for the quick review. The reason I added embedding_model as an attribute is because in line 61, I made it so you can pass your embedding model to the from_langchain method which returns an instance of TestsetGenerator, containing both the LLM and embedding model (line 68). It allows the TestsetGenerator object to have an embedding model attributed to it, so if the user does not pass an embedding model to the generate_with_langchain_docs for the transforms_embedding parameter, it can use the embedding model that is attributed to the class, and the method will not throw an error (see line 108).

I do understand your point though. And we can remove that if you want. Thanks!

I think it is okay for now - merging this in 🙂

…ect. - edit the document because embedding_model variable has been added to the TestsetGenerator object. - explodinggradients#1562

…ject (#1606) - edit the document because embedding_model attribute has been added to the TestsetGenerator object. - #1562

hunter-walden2113 and others added 2 commits October 23, 2024 09:46

add embeddings to TestsetGenerator

9bddf67

Merge pull request #1 from SteampunkFoundry/1526-parameterize-models

7750ba7

add embeddings to TestsetGenerator

dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Oct 23, 2024

jjmachan reviewed Oct 23, 2024

View reviewed changes

jjmachan merged commit fcaf4d0 into explodinggradients:main Oct 24, 2024
16 checks passed

Youngrok123 mentioned this pull request Oct 31, 2024

docs: Add embedding_model attribute when creating TestsetGenerator object #1606

Merged

jjmachan pushed a commit that referenced this pull request Nov 1, 2024

docs: Add embedding_model attribute when creating TestsetGenerator ob…

20fa51c

…ject (#1606) - edit the document because embedding_model attribute has been added to the TestsetGenerator object. - #1562

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add embeddings to TestsetGenerator #1562

add embeddings to TestsetGenerator #1562

Uh oh!

hunter-walden2113 commented Oct 23, 2024 •

edited

Loading

Uh oh!

jjmachan Oct 23, 2024

Uh oh!

hunter-walden2113 Oct 23, 2024 •

edited

Loading

Uh oh!

jjmachan Oct 24, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add embeddings to TestsetGenerator #1562

add embeddings to TestsetGenerator #1562

Uh oh!

Conversation

hunter-walden2113 commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjmachan Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

hunter-walden2113 Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jjmachan Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hunter-walden2113 commented Oct 23, 2024 •

edited

Loading

hunter-walden2113 Oct 23, 2024 •

edited

Loading