This is a locally deployed AI Helpdesk system built with a fine-tuned language model and Retrieval-Augmented Generation (RAG) architecture. It supports PDF, DOCX, PPTX, and TXT document ingestion and provides a FastAPI-powered backend with a React frontend (optional).
Developed by Aditya Agrawal.
FineTuning/ # For model finetuning and training
├── app.py # Finetune script
├── qna.jsonl # Dataset
RAG/ # Main RAG and API logic
├── auth/
│ ├── auth_utils.py # Auth logic and admin users
│ ├── generate_pass.py # Generate hashed password
│
├── data/
│ ├── uploaded_docs/ # Uploaded files
│ ├── faiss_index/ # FAISS vector store
│ ├── finetuned-model/ # Output of finetuning
│ ├── local_embedding... # Embedding script (local)
│
├── llm/
│ ├── model.py # Model calling wrapper
│ ├── merge_lora.py # Merge LoRA with base model
│ ├── quantize_gptq.py # GPTQ quantization
│ ├── quant_awq.py # AWQ quantization
│ ├── query.py # Model query handler
│
├── routes/
│ ├── admin_router.py # Routes for admin tasks
│ ├── chat_router.py # User chat endpoints
│ ├── auth_router.py # Authentication routes
│
├── vectorstore/
│ ├── documents.py # Document parsing and loading
│ ├── store.py # Vector DB saving/loading
app.py # Main FastAPI app
requirements.txt # Python dependencies
Activate the vLLM environment and run:
source vllm_env/bin/activate
python3 -m vllm.entrypoints.openai.api_server \
--model /home/vssc/RAG/data/merged_model \
--dtype auto \
--port 8080 \
--gpu-memory-utilization 0.5 \
--max-model-len 1024 \
--max-num-batched-tokens 1024 \
--block-size 8Activate the portable environment and run the API:
source portable_env/bin/activate
cd RAG
python app.pyTo fine-tune the base model:
- Place your training data in
FineTuning/qna.jsonl. - Run the finetuning script:
cd FineTuning
python app.py- Merge LoRA weights with base model:
cd RAG/llm
python merge_lora.pyFor optimized deployment on smaller devices or faster inference:
python quantize_gptq.py # GPTQ
python quant_awq.py # AWQTo add admin users:
-
Open
RAG/auth/auth_utils.pyand add user entries to theusersdictionary. -
Generate hashed password using:
cd RAG/auth
python generate_pass.pyPaste the hashed password into the users dictionary.
Install Python dependencies with:
pip install -r requirements.txt- Built to run fully offline on an intranet server.
- React frontend available separately for user chat interface and admin document upload.
- Uses vLLM for fast local inference of merged/quantized models.
- Supports concurrent request handling and semantic search via FAISS.
Aditya Agrawal
Developer of the AI Helpdesk system.
Contact: [[email protected]]