Time-aware, table-literate, verifiable RAG app. Single-command local run via Streamlit + FAISS + Transformers.
Core features (MVP):
- Document ingestion from
data/samples(PDF, TXT, MD). - Vector search (Sentence-Transformers) + context-constrained answers (FLAN-T5).
- Topic tagging (zero-shot) and optional FinBERT sentiment.
- Table extraction from PDFs and Table QA with TAPAS (cell coordinates returned).
- Simple numeric self-check widget (sum/avg over cited table columns).
This is a lightweight, on-device-friendly version. It avoids external APIs by default.
1) Clone / unzip and install
# (Option A) Local venv
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python -c "import nltk; import ssl; ssl._create_default_https_context = ssl._create_unverified_context; import nltk; nltk.download('punkt'); nltk.download('stopwords')" || trueFirst run will download small models (embeddings, FLAN-T5, TAPAS).
2) Run
streamlit run streamlit_app.pyOpen the URL shown (default: http://localhost:8501).
3) Docker (optional)
docker build -t marketops-copilot .
docker run --rm -p 8501:8501 -v $PWD/data:/app/data marketops-copilot4) Project structure
marketops-copilot/
├─ streamlit_app.py
├─ app/
│ ├─ core/
│ │ ├─ rag.py
│ │ ├─ table_qa.py
│ │ ├─ classify.py
│ │ └─ verify.py
│ └─ ingest/
│ └─ pdf_parse.py
├─ data/
│ └─ samples/ # (placeholders)
│ ├─ notice_example.txt
│ ├─ fees_table.pdf
│ └─ listing_rules_excerpt.md
├─ requirements.txt
├─ Dockerfile
├─ .gitignore
└─ README.md
5) Add your own docs
Drop PDFs / TXTs into data/samples and click Ingest in the left sidebar.
- If you have trouble with
camelot, use basic PDF text viapdfplumber. - TAPAS expects tabular data. If your PDFs have embedded tables, extraction quality varies by file.