-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Hi ydata-profiling team,
first of all, thank you for building such a useful tool for fast exploratory data analysis and data quality checks.
I maintain an MIT-licensed project called WFGY Problem Map, which is a 16-question diagnostic checklist for RAG / LLM pipelines in production. While ydata-profiling focuses on data quality at the table level, WFGY focuses on what happens after those tables are turned into documents, chunks, and vector indices for LLMs.
Why I think this could be useful for your users:
- Many engineers now use ydata-profiling as a first step before pushing data into vector stores or LLM pipelines.
- Several of the 16 failure modes are about “silent breaks” between the nicely profiled dataset and the actual retrieval behavior of the LLM system.
- The checklist is framework-agnostic and can sit alongside existing EDA tools.
External references for WFGY Problem Map include:
- ToolUniverse (Harvard MIMS Lab)
- Multimodal RAG Survey (QCRI LLM Lab)
- Rankify (University of Innsbruck)
Suggestion:
If you think it is appropriate, would you consider adding a small “Further reading / external checklist” link in your documentation for users who are sending profiled data into RAG / LLM stacks?
“RAG / LLM debugging checklist: WFGY Problem Map (16 failure modes)”
https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md
Project home: https://github.com/onestardao/WFGY
Thank you for your time and for the great project.
Best,
PSBigBig