add embeddings to TestsetGenerator #1562
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Addresses the "no embeddings found" and "API Connection error" issues.
Specifically issues: 1546, 1526, 1512, 1496
Users have reported that they cannot generate a Testset because they get API connection errors, or their knowledge graph does not have the embeddings. This is due to the use of the default LLMs and Embedding models via llm_factory and embedding_factory. The errors are occuring becuase the users do not have OpenAI credentials in their environment because they are using different models in their workflow.
Issue to solve is to prevent the default_transforms function from using the llm_factory by forcing the user to add both an embedding model and llm model when instantiating TestsetGenerator.
embedding_modelas an attribute toTestsetGenerator.embedding_model: LangchainEmbeddingsas a parameter toTestsetGenerator.from_langchainTestsetGenerator.from_langchaintoreturn cls(LangchainLLMWrapper(llm), LangchainEmbeddingsWrapper(embedding_model), knowledge_graph)llmandembedding_modelparameter toTestsetGenerator.generate_with_langchain_docs