-
Notifications
You must be signed in to change notification settings - Fork 880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add embeddings to TestsetGenerator #1562
Conversation
add embeddings to TestsetGenerator
knowledge_graph : KnowledgeGraph, default empty | ||
The knowledge graph to use for the generation process. | ||
""" | ||
|
||
llm: BaseRagasLLM | ||
embedding_model: BaseRagasEmbeddings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everything looks good but just ones point here. The TestsetGenerator
object does not need an embedding model since it does not use it why synthesizing the testset (embeddings are only need for transforms).
I love make the llm and embedding model and forcing the user there - will help solve a lot of the issue
Thanks a lot for making this contribution 🙂 ❤️, let me know what you think about the point I made
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks for the quick review. The reason I added embedding_model
as an attribute is because in line 61, I made it so you can pass your embedding model to the from_langchain
method which returns an instance of TestsetGenerator, containing both the LLM and embedding model (line 68). It allows the TestsetGenerator object to have an embedding model attributed to it, so if the user does not pass an embedding model to the generate_with_langchain_docs
for the transforms_embedding
parameter, it can use the embedding model that is attributed to the class, and the method will not throw an error (see line 108).
I do understand your point though. And we can remove that if you want. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is okay for now - merging this in 🙂
…ect. - edit the document because embedding_model variable has been added to the TestsetGenerator object. - explodinggradients#1562
Addresses the "no embeddings found" and "API Connection error" issues.
Specifically issues: 1546, 1526, 1512, 1496
Users have reported that they cannot generate a Testset because they get API connection errors, or their knowledge graph does not have the embeddings. This is due to the use of the default LLMs and Embedding models via llm_factory and embedding_factory. The errors are occuring becuase the users do not have OpenAI credentials in their environment because they are using different models in their workflow.
Issue to solve is to prevent the default_transforms function from using the llm_factory by forcing the user to add both an embedding model and llm model when instantiating TestsetGenerator.
embedding_model
as an attribute toTestsetGenerator
.embedding_model: LangchainEmbeddings
as a parameter toTestsetGenerator.from_langchain
TestsetGenerator.from_langchain
toreturn cls(LangchainLLMWrapper(llm), LangchainEmbeddingsWrapper(embedding_model), knowledge_graph)
llm
andembedding_model
parameter toTestsetGenerator.generate_with_langchain_docs