Skip to content

Suggestion: add WFGY Problem Map (RAG / LLM debugging checklist) as a complementary resource #1807

@onestardao

Description

@onestardao

Hi ydata-profiling team,

first of all, thank you for building such a useful tool for fast exploratory data analysis and data quality checks.

I maintain an MIT-licensed project called WFGY Problem Map, which is a 16-question diagnostic checklist for RAG / LLM pipelines in production. While ydata-profiling focuses on data quality at the table level, WFGY focuses on what happens after those tables are turned into documents, chunks, and vector indices for LLMs.

Why I think this could be useful for your users:

  • Many engineers now use ydata-profiling as a first step before pushing data into vector stores or LLM pipelines.
  • Several of the 16 failure modes are about “silent breaks” between the nicely profiled dataset and the actual retrieval behavior of the LLM system.
  • The checklist is framework-agnostic and can sit alongside existing EDA tools.

External references for WFGY Problem Map include:

  • ToolUniverse (Harvard MIMS Lab)
  • Multimodal RAG Survey (QCRI LLM Lab)
  • Rankify (University of Innsbruck)

Suggestion:

If you think it is appropriate, would you consider adding a small “Further reading / external checklist” link in your documentation for users who are sending profiled data into RAG / LLM stacks?

“RAG / LLM debugging checklist: WFGY Problem Map (16 failure modes)”
https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md

Project home: https://github.com/onestardao/WFGY

Thank you for your time and for the great project.

Best,
PSBigBig

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions