This repo contains a local RAG demo, a Streamlit chatbot UI, utilities to build and reuse a FAISS vector DB, and LoRA fine-tuning/merge helpers.
- Python 3.11+ (tested on Windows 10)
- Ollama running locally (
http://localhost:11434) - Recommended shell: Windows PowerShell
python -m pip install -r requirements.txtollama pull nomic-embed-text
ollama pull qwen2.5:0.5b-instructloader.py loads PDF(s), splits and sanitizes text, embeds with Ollama, and saves a FAISS index to disk.
Examples:
# Single PDF → saved to .\faiss_index
python .\loader.py --pdf C:\Source\research\docx\report-ko.pdf --out C:\Source\research\faiss_index --emb nomic-embed-text --base http://localhost:11434
# All PDFs in a folder
python .\loader.py --dir C:\Source\research\docx --out C:\Source\research\faiss_index --emb nomic-embed-text --base http://localhost:11434Notes:
- Text is sanitized to remove invalid surrogate characters to avoid JSON encoding errors in the Ollama client.
- Default chunking is 250/50 (size/overlap). Adjust in
loader.pyif desired. - The output directory contains FAISS index files that can be reloaded later without recomputing embeddings.
interface.py provides a dark-mode chat UI with chat history, retrieval controls, source panel, and an embedded PDF viewer with pagination.
streamlit run interface.pyBehavior:
- On startup, the app auto-loads a FAISS index from
FAISS_INDEX_DIRor./faiss_indexif present. - Uses Ollama locally with defaults below; you can override via environment variables.
- Right panel shows the original PDF (picker + viewer). Pagination controls are centered at the bottom of the viewer.
Environment variables (optional):
FAISS_INDEX_DIR→ path to a saved FAISS index (default./faiss_index)OLLAMA_HOST→ e.g.,http://localhost:11434OLLAMA_EMBED→ embedding model tag (defaultnomic-embed-text)OLLAMA_LLM→ chat model tag (defaultqwen2.5:0.5b-instruct)
main.py shows a minimal RAG flow using a prebuilt FAISS index.
Edit the index path in main.py if needed:
INDEX_DIR = r"C:\\Source\\research\\faiss_index"Run:
python .\main.pyQuick demonstration fine-tune using TRL.
Input data format: data.jsonl lines with keys prompt and response.
{"prompt": "...", "response": "..."}Run:
python .\train.pyOutputs are written to OUT_DIR (default qwen2.5-3b-lora). See code for tunables.
Creates a merged Hugging Face folder you can use directly or export to GGUF.
Examples:
# Merge only
python .\merge.py --base Qwen/Qwen2.5-3B-Instruct --adapter C:\Source\research\qwen2.5-3b-lora --out C:\Source\research\qwen2.5-3b-merged --cpu-only --dtype fp32
# Merge and test generation
python .\merge.py --base Qwen/Qwen2.5-0.5B-Instruct --adapter C:\Source\research\qwen2.5-3b-lora --out C:\Source\research\qwen2.5-3b-merged --cpu-only --dtype fp32 --infer