This application implements a Retrieval-Augmented Generation (RAG) system that:
- Ingest documents into Qdrant vector database
- Takes user questions as input
- Retrieves relevant information from a Qdrant vector database
- Uses the retrieved context to generate accurate answers with locally hosted Ollama models
- Python 3.12+
- Docker
- Ollama installed and running locally
- Qdrant running locally (Docker command is given below)
- Clone this repository
- Install the required packages:
pip install -r requirements.txt
- Set up environment variables (optional, defaults are set in the code):
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "http://localhost:11434")
OLLAMA_EMBED_MODEL = os.getenv("OLLAMA_EMBED_MODEL", "nomic-embed-text")
OLLAMA_LLM_MODEL = os.getenv("OLLAMA_LLM_MODEL", "llama3")
QDRANT_URL = os.getenv("QDRANT_URL", "http://localhost:6333")
DOCUMENTS_DIR = "./documents"
COLLECTION_NAME = "documents-1"
Ensure you have the necessary models pulled in Ollama:
- For embeddings (you can choose a different embedding model as well):
ollama pull nomic-embed-text
- For text generation (you can choose a different LLM):
ollama pull llama3
- Start Qdrant (if running locally):
docker run -p 6333:6333 qdrant/qdrant
- Ensure Ollama is running:
ollama serve
- Initialize the system with sample documents:
python main.py
- Enter question and check answers 🚀
-
See the vector store to check all documents are ingested
Open link: http://localhost:6333/dashboard#/collections
Once the application is running, you will be prompted to input questions. The system will:
- Convert your question to a vector embedding using Ollama
- Search the Qdrant database for similar content
- Retrieve the most relevant document chunks
- Send your question along with the retrieved context to Ollama
- Return the generated answer
Type 'exit' to quit the application.
You can extend this application by:
- Adding document loaders for various file types (PDFs, Word documents, websites)
- Implementing a web interface
- Adding document deletion or update functionality
- Incorporating multiple vector stores or different embedding models
- Adding user authentication for multi-user environments

