RAG-Ultra

RAG-Ultra is a high-accuracy multimodal RAG (Retrieval-Augmented Generation) chatbot designed to handle complex documents including scanned PDFs, tables, formulas, and visual content.

It leverages a robust tech stack to ensure reliability and detailed reasoning:

Amazon Textract: For enterprise-grade OCR ingestion, capable of preserving layout and extracting tables/forms from scanned documents.
Advanced Retrieval: Implements Parent-Document Retrieval to maintain context and reduce hallucinations.
Reranking: integrated cross-encoder/reranking steps to ensure the most relevant chunks are sent to the LLM.
Multimodal capabilities: Uses Vision-Language (VL) models for reasoning over chart/diagram extracts.
LangSmith: Full observability integration for tracing and debugging.

Architecture

Ingestion Pipeline:
- PDFs are processed via boto3 and Amazon Textract.
- Text, tables, and raw image regions are extracted.
- Data is chunked and indexed. Text goes to a vector store; images are summarized by a VLM (e.g., GPT-4o, Claude 3.5 Sonnet) and embedded, or indexed purely by their summaries.
Retrieval:
- Parent Document Retriever: Fetches larger context blocks for retrieved small chunks.
- Reranking: Re-orders results based on relevance score to improve precision.
Generation (LangGraph):
- A directed cyclic graph (DAG) manages the state.
- Nodes: Retrieve -> Grade Documents -> Rerank -> Generate.
- Support for "Visual Reasoning" loops if image context is retrieved.

Setup

Clone the repository:

git clone https://github.com/SriviharReddy/RAG-Ultra.git
cd RAG-Ultra

Install dependencies:
```
pip install -r requirements.txt
```
Environment Variables: Copy .env.example to .env and fill in your keys:
- OPENAI_API_KEY (for embeddings/generation)
- AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY (for Textract)
- LANGCHAIN_API_KEY (for LangSmith)
Run the App:
```
streamlit run app.py
```

Tech Stack

LangChain / LangGraph
Streamlit
Amazon Textract
ChromaDB (Vector Store)
LangSmith

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG-Ultra

Architecture

Setup

Tech Stack

License

About

Uh oh!

Releases

Packages

Languages

SriviharReddy/RAG-Ultra

Folders and files

Latest commit

History

Repository files navigation

RAG-Ultra

Architecture

Setup

Tech Stack

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages