Skip to content

aditya-agr/AI-HelpDesk-Server-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

🧠 AI Helpdesk System

This is a locally deployed AI Helpdesk system built with a fine-tuned language model and Retrieval-Augmented Generation (RAG) architecture. It supports PDF, DOCX, PPTX, and TXT document ingestion and provides a FastAPI-powered backend with a React frontend (optional).

Developed by Aditya Agrawal.


🛠️ Project Structure

FineTuning/                # For model finetuning and training
├── app.py                 # Finetune script
├── qna.jsonl              # Dataset

RAG/                       # Main RAG and API logic
├── auth/
│   ├── auth_utils.py      # Auth logic and admin users
│   ├── generate_pass.py   # Generate hashed password
│
├── data/
│   ├── uploaded_docs/     # Uploaded files
│   ├── faiss_index/       # FAISS vector store
│   ├── finetuned-model/   # Output of finetuning
│   ├── local_embedding... # Embedding script (local)
│
├── llm/
│   ├── model.py           # Model calling wrapper
│   ├── merge_lora.py      # Merge LoRA with base model
│   ├── quantize_gptq.py   # GPTQ quantization
│   ├── quant_awq.py       # AWQ quantization
│   ├── query.py           # Model query handler
│
├── routes/
│   ├── admin_router.py    # Routes for admin tasks
│   ├── chat_router.py     # User chat endpoints
│   ├── auth_router.py     # Authentication routes
│
├── vectorstore/
│   ├── documents.py       # Document parsing and loading
│   ├── store.py           # Vector DB saving/loading

app.py                     # Main FastAPI app
requirements.txt           # Python dependencies

🚀 How to Run the Project

Step 1: Start the vLLM Server

Activate the vLLM environment and run:

source vllm_env/bin/activate

python3 -m vllm.entrypoints.openai.api_server \
  --model /home/vssc/RAG/data/merged_model \
  --dtype auto \
  --port 8080 \
  --gpu-memory-utilization 0.5 \
  --max-model-len 1024 \
  --max-num-batched-tokens 1024 \
  --block-size 8

Step 2: Start the FastAPI Server

Activate the portable environment and run the API:

source portable_env/bin/activate
cd RAG
python app.py

🧪 Finetuning the Model

To fine-tune the base model:

  1. Place your training data in FineTuning/qna.jsonl.
  2. Run the finetuning script:
cd FineTuning
python app.py
  1. Merge LoRA weights with base model:
cd RAG/llm
python merge_lora.py

🧱 Quantization (Optional)

For optimized deployment on smaller devices or faster inference:

python quantize_gptq.py   # GPTQ
python quant_awq.py       # AWQ

🔐 Authentication

To add admin users:

  1. Open RAG/auth/auth_utils.py and add user entries to the users dictionary.

  2. Generate hashed password using:

cd RAG/auth
python generate_pass.py

Paste the hashed password into the users dictionary.


📦 Dependencies

Install Python dependencies with:

pip install -r requirements.txt

📌 Notes

  • Built to run fully offline on an intranet server.
  • React frontend available separately for user chat interface and admin document upload.
  • Uses vLLM for fast local inference of merged/quantized models.
  • Supports concurrent request handling and semantic search via FAISS.

👨‍💻 Author

Aditya Agrawal
Developer of the AI Helpdesk system.
Contact: [[email protected]]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published