Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add embeddings to TestsetGenerator #1562

Merged
merged 2 commits into from
Oct 24, 2024

Conversation

hunter-walden2113
Copy link
Contributor

@hunter-walden2113 hunter-walden2113 commented Oct 23, 2024

Addresses the "no embeddings found" and "API Connection error" issues.
Specifically issues: 1546, 1526, 1512, 1496

Users have reported that they cannot generate a Testset because they get API connection errors, or their knowledge graph does not have the embeddings. This is due to the use of the default LLMs and Embedding models via llm_factory and embedding_factory. The errors are occuring becuase the users do not have OpenAI credentials in their environment because they are using different models in their workflow.

Issue to solve is to prevent the default_transforms function from using the llm_factory by forcing the user to add both an embedding model and llm model when instantiating TestsetGenerator.

  1. Added embedding_model as an attribute to TestsetGenerator.
  2. Added embedding_model: LangchainEmbeddings as a parameter to TestsetGenerator.from_langchain
  3. Changed the return from TestsetGenerator.from_langchain to return cls(LangchainLLMWrapper(llm), LangchainEmbeddingsWrapper(embedding_model), knowledge_graph)
  4. Added both an llm and embedding_model parameter to TestsetGenerator.generate_with_langchain_docs

hunter-walden2113 and others added 2 commits October 23, 2024 09:46

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
add embeddings to TestsetGenerator
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Oct 23, 2024
knowledge_graph : KnowledgeGraph, default empty
The knowledge graph to use for the generation process.
"""

llm: BaseRagasLLM
embedding_model: BaseRagasEmbeddings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything looks good but just ones point here. The TestsetGenerator object does not need an embedding model since it does not use it why synthesizing the testset (embeddings are only need for transforms).

I love make the llm and embedding model and forcing the user there - will help solve a lot of the issue

Thanks a lot for making this contribution 🙂 ❤️, let me know what you think about the point I made

Copy link
Contributor Author

@hunter-walden2113 hunter-walden2113 Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks for the quick review. The reason I added embedding_model as an attribute is because in line 61, I made it so you can pass your embedding model to the from_langchain method which returns an instance of TestsetGenerator, containing both the LLM and embedding model (line 68). It allows the TestsetGenerator object to have an embedding model attributed to it, so if the user does not pass an embedding model to the generate_with_langchain_docs for the transforms_embedding parameter, it can use the embedding model that is attributed to the class, and the method will not throw an error (see line 108).

I do understand your point though. And we can remove that if you want. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is okay for now - merging this in 🙂

@jjmachan jjmachan merged commit fcaf4d0 into explodinggradients:main Oct 24, 2024
16 checks passed
Youngrok123 added a commit to Youngrok123/ragas that referenced this pull request Oct 31, 2024
…ect.

- edit the document because embedding_model variable has been added to the TestsetGenerator object.
- explodinggradients#1562
jjmachan pushed a commit that referenced this pull request Nov 1, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
…ject (#1606)

- edit the document because embedding_model attribute has been added to
the TestsetGenerator object.
- #1562
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants