Production RAG Pipeline

A production ready, fully local retrieval augmented generation (RAG) toolkit. The pipeline ingests your PDF knowledge base into FAISS, serves a FastAPI retrieval service, and delivers fluent answers with Flan-T5-base through both JSON APIs and a minimal web UI.

Key Capabilities

Local ingestion of PDFs into a FAISS inner-product index
Query-time retrieval with adjustable top_k and score_threshold
Answer generation via google/flan-t5-base with inline citations (PDF, page)
Browser UI at / for rapid testing alongside JSON endpoints
Pluggable device selection (CPU default, optional MPS/CUDA)

Architecture Overview

PDFs ──► ingest.py ──► FAISS index + metadata.json
                        │
                        ▼
                  FastAPI app (app.py)
                ├─ POST /search  → top-k chunks
                ├─ POST /query   → generated answer + references
                └─ GET  /        → minimal UI

Prerequisites

Python 3.10 (tested on macOS; Linux/Windows should work with equivalent deps)
Conda (recommended) or virtualenv
Hugging Face account (optional; no API key required for public models)

Quick Start

# create environment
conda create -n rag python=3.10 -y
conda activate rag
pip install -r requirements.txt

# add pdfs to ingest
mkdir -p data/pdfs
cp /path/to/*.pdf data/pdfs/

# build the index (run whenever pdf set changes)
HF_HUB_ENABLE_HF_TRANSFER=1 python ingest.py --device cpu --batch-size 16

# launch api + web ui (http://127.0.0.1:8000/)
RAG_DEVICE=cpu TOKENIZERS_PARALLELISM=false OMP_NUM_THREADS=1 \
  uvicorn app:app --reload --host 127.0.0.1 --port 8000

API Reference

Endpoint	Method	Description
`/health`	GET	Returns index status and model/device metadata.
`/search`	POST	Retrieves top-k chunks. Body: `{ "question": ..., "top_k": 5, "score_threshold": 0.15 }`
`/query`	POST	Retrieves + generates answer. Same body as `/search`; response includes `answer`, `references`, `used_tokens`, `new_tokens`.

Example query:

curl -s -H 'Content-Type: application/json' \
     -X POST http://127.0.0.1:8000/query \
     -d '{"question":"what problem does Sarathi-Serve solve?","top_k":3,"score_threshold":0.15}'

Web UI

Navigate to http://127.0.0.1:8000/
Enter a question and press Ask (or ⌘/Ctrl + Enter)
Answer panel shows the generated summary; References list the supporting PDFs

Configuration

Setting	Where	Default	Notes
`--pdf-dir`	ingest.py flag	`data/pdfs`	Source PDFs
`--out-dir`	ingest.py flag	`data/index`	Index & metadata output
`--chunk-size`	ingest.py flag	800 chars	Adjust chunk granularity
`--overlap`	ingest.py flag	150 chars	Overlap between chunks
`--batch-size`	ingest.py flag	64	Embedding batch size
`--device`	ingest.py flag	auto	Use `cpu`, `mps`, or `cuda`
`RAG_DEVICE`	env var	`cpu`	Device for retrieval + generation
`TOKENIZERS_PARALLELISM`	env var	`false`	Prevent tokenizer warnings
`OMP_NUM_THREADS`	env var	`1`	Deterministic generation

Project Layout

app.py            # fastapi service + web ui
ingest.py         # pdf ingestion → faiss index / metadata
retriever.py      # faiss wrapper + sentence-transformers embedder
generator.py      # flan-t5-base generation helper
tests/
  test_chunking.py  # chunking unit test
requirements.txt  # pinned dependencies

Testing

PYTHONPATH=. pytest tests/test_chunking.py

Troubleshooting

Segfault on macOS (MPS): set RAG_DEVICE=cpu and rerun.
“Index missing” errors: rebuild with python ingest.py and restart uvicorn.
Short/terse answers: increase top_k, lower score_threshold, or adjust prompt in generator.py.
Downloads hang: ensure HF_HUB_ENABLE_HF_TRANSFER=1 and only one ingest process runs.

License

MIT License — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Production RAG Pipeline

Key Capabilities

Architecture Overview

Prerequisites

Quick Start

API Reference

Web UI

Configuration

Project Layout

Testing

Troubleshooting

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data/pdfs		data/pdfs
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
generator.py		generator.py
ingest.py		ingest.py
requirements.txt		requirements.txt
retriever.py		retriever.py

License

Yonas650/Production-RAG-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Production RAG Pipeline

Key Capabilities

Architecture Overview

Prerequisites

Quick Start

API Reference

Web UI

Configuration

Project Layout

Testing

Troubleshooting

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages