GitHub - ai-joon/offline-llms: Local RAG + offline LLM demo with Streamlit chat UI, FAISS vector DB, and LoRA tuning/merge utilities — fully powered by Ollama models running locally. Tech Stack: Python 3.11+ · Streamlit · FAISS · Ollama (local LLM + embeddings) · nomic-embed-text · qwen2.5-instruct · PDF ingestion + chunking · LoRA fine-tuning helpers

Offline LLMs Running

This repo contains a local RAG demo, a Streamlit chatbot UI, utilities to build and reuse a FAISS vector DB, and LoRA fine-tuning/merge helpers.

Prerequisites

Python 3.11+ (tested on Windows 10)
Ollama running locally (http://localhost:11434)
Recommended shell: Windows PowerShell

Install dependencies

python -m pip install -r requirements.txt

Pull required Ollama models

ollama pull nomic-embed-text
ollama pull qwen2.5:0.5b-instruct

Build a FAISS vector database (loader.py)

loader.py loads PDF(s), splits and sanitizes text, embeds with Ollama, and saves a FAISS index to disk.

Examples:

# Single PDF → saved to .\faiss_index
python .\loader.py --pdf C:\Source\research\docx\report-ko.pdf --out C:\Source\research\faiss_index --emb nomic-embed-text --base http://localhost:11434

# All PDFs in a folder
python .\loader.py --dir C:\Source\research\docx --out C:\Source\research\faiss_index --emb nomic-embed-text --base http://localhost:11434

Notes:

Text is sanitized to remove invalid surrogate characters to avoid JSON encoding errors in the Ollama client.
Default chunking is 250/50 (size/overlap). Adjust in loader.py if desired.
The output directory contains FAISS index files that can be reloaded later without recomputing embeddings.

Run the Streamlit RAG chatbot (interface.py)

interface.py provides a dark-mode chat UI with chat history, retrieval controls, source panel, and an embedded PDF viewer with pagination.

streamlit run interface.py

Behavior:

On startup, the app auto-loads a FAISS index from FAISS_INDEX_DIR or ./faiss_index if present.
Uses Ollama locally with defaults below; you can override via environment variables.
Right panel shows the original PDF (picker + viewer). Pagination controls are centered at the bottom of the viewer.

Environment variables (optional):

FAISS_INDEX_DIR → path to a saved FAISS index (default ./faiss_index)
OLLAMA_HOST → e.g., http://localhost:11434
OLLAMA_EMBED → embedding model tag (default nomic-embed-text)
OLLAMA_LLM → chat model tag (default qwen2.5:0.5b-instruct)

Minimal RAG script (main.py)

main.py shows a minimal RAG flow using a prebuilt FAISS index.

Edit the index path in main.py if needed:

INDEX_DIR = r"C:\\Source\\research\\faiss_index"

Run:

python .\main.py

Fine-tuning with LoRA (train.py)

Quick demonstration fine-tune using TRL.

Input data format: data.jsonl lines with keys prompt and response.

{"prompt": "...", "response": "..."}

Run:

python .\train.py

Outputs are written to OUT_DIR (default qwen2.5-3b-lora). See code for tunables.

Merge LoRA into a base model (merge.py)

Creates a merged Hugging Face folder you can use directly or export to GGUF.

Examples:

# Merge only
python .\merge.py --base Qwen/Qwen2.5-3B-Instruct --adapter C:\Source\research\qwen2.5-3b-lora --out C:\Source\research\qwen2.5-3b-merged --cpu-only --dtype fp32

# Merge and test generation
python .\merge.py --base Qwen/Qwen2.5-0.5B-Instruct --adapter C:\Source\research\qwen2.5-3b-lora --out C:\Source\research\qwen2.5-3b-merged --cpu-only --dtype fp32 --infer

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
ai		ai
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
start_flask_backend.bat		start_flask_backend.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Offline LLMs Running

Prerequisites

Install dependencies

Pull required Ollama models

Build a FAISS vector database (loader.py)

Run the Streamlit RAG chatbot (interface.py)

Minimal RAG script (main.py)

Fine-tuning with LoRA (train.py)

Merge LoRA into a base model (merge.py)

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

ai-joon/offline-llms

Folders and files

Latest commit

History

Repository files navigation

Offline LLMs Running

Prerequisites

Install dependencies

Pull required Ollama models

Build a FAISS vector database (loader.py)

Run the Streamlit RAG chatbot (interface.py)

Minimal RAG script (main.py)

Fine-tuning with LoRA (train.py)

Merge LoRA into a base model (merge.py)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages