-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
I checked the documentation and related resources and couldn't find an answer to my question.
Your Question
I try to run a demo by retriving data from RAG, convert to langchain documents, and apply default transform.
records, _ = vector_client.scroll(
collection_name=config.rag.collection_name, limit=2
)
assert records
documents: list[Document] = []
for record in records:
page_content, metadata = (
record.payload["page_content"],
record.payload["metadata"],
)
documents.append(
Document(
page_content=page_content,
metadata=metadata,
)
)
generator_llm = llm_factory(
model=config.langgraph.llm_model, base_url=config.langgraph.llm_base_url
)
# I write a simple embedding client and verifies it works
embedding_model = LangchainEmbeddingsWrapper(rag_client.embeddings)
generator = TestsetGenerator(llm=generator_llm, embedding_model=embedding_model)
dataset = generator.generate_with_langchain_docs(documents, testset_size=10)and the demo shows error
/Users/xialei/synthetic_data_generator/src/synthetic_data_agent/main.py:52: DeprecationWarning: LangchainEmbeddingsWrapper is deprecated and will be removed in a future version. Use the modern embedding providers instead: embedding_factory('openai', model='text-embedding-3-small', client=openai_client) or from ragas.embeddings import OpenAIEmbeddings, GoogleEmbeddings, HuggingFaceEmbeddings
embedding_model = LangchainEmbeddingsWrapper(rag_client.embeddings)
Applying HeadlinesExtractor: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.03it/s]
Applying HeadlineSplitter: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 14513.16it/s]
Applying SummaryExtractor: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.98it/s]
Applying CustomNodeFilter: 0it [00:00, ?it/s]
Applying EmbeddingExtractor: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 9799.78it/s]
Applying ThemesExtractor: 0it [00:00, ?it/s]
Applying NERExtractor: 0it [00:00, ?it/s]
Traceback (most recent call last):
File "/Users/xialei/synthetic_data_generator/src/synthetic_data_agent/main.py", line 61, in
main()
~~~~^^
File "/Users/xialei/synthetic_data_generator/src/synthetic_data_agent/main.py", line 55, in main
dataset = generator.generate_with_langchain_docs(documents, testset_size=10)
File "/Users/xialei/synthetic_data_generator/.venv/lib/python3.13/site-packages/ragas/testset/synthesizers/generate.py", line 191, in generate_with_langchain_docs
apply_transforms(kg, transforms)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
File "/Users/xialei/synthetic_data_generator/.venv/lib/python3.13/site-packages/ragas/testset/transforms/engine.py", line 91, in apply_transforms
apply_transforms(kg, transform, run_config, callbacks)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/xialei/synthetic_data_generator/.venv/lib/python3.13/site-packages/ragas/testset/transforms/engine.py", line 93, in apply_transforms
apply_transforms(kg, transforms.transformations, run_config, callbacks)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/xialei/synthetic_data_generator/.venv/lib/python3.13/site-packages/ragas/testset/transforms/engine.py", line 91, in apply_transforms
apply_transforms(kg, transform, run_config, callbacks)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/xialei/synthetic_data_generator/.venv/lib/python3.13/site-packages/ragas/testset/transforms/engine.py", line 98, in apply_transforms
coros = transforms.generate_execution_plan(kg)
File "/Users/xialei/synthetic_data_generator/.venv/lib/python3.13/site-packages/ragas/testset/transforms/relationship_builders/cosine.py", line 93, in generate_execution_plan
raise ValueError(f"Node {node.id} has no {self.property_name}")
ValueError: Node 80c0a752-5794-40a1-b0db-99e31d153710 has no summary_embedding
So there must be silent corruptions, I dig into the code, and find none of the extractor generates output, the internal error is muted.
Then I do a simple test by
generator_llm = llm_factory(
model=config.langgraph.llm_model, base_url=config.langgraph.llm_base_url
)
result = generator_llm.generate_text(StringPromptValue(text="Hello, world!"))
print(result)and find the reason:
openai.BadRequestError: {
"error": {
"message": "Unsupported value: 'temperature' does not support 0.01 with this model. Only the default (1) value is supported.",
"type": "invalid_request_error",
"param": "temperature",
"code": "unsupported_value"
}
}{"code":20000,"msg":"Unknown error []"}
so, it frustrates! PLEASE! do not hide errors, just let it crash.
BTW, the model name is gpt-5-mini.
Code Examples
paste above
Additional context