Skip to content

[feat] OCR based application #158

@yichuan-w

Description

@yichuan-w

What problem does this solve?

Right now, LEANN is using text embedding only. We have two other options for multimodal data:

  1. Use DeepSeek OCR or MinerU to process all into text space
  2. maintain both image vectors and text vectors separately

Proposed solution

RAGanything repo, MinerU

Example usage

To RAG over vision-rich task

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions