Welcome to the Retrieval-Augmented Generation (RAG) Chatbot repository. This project enables users to upload documents (TXT and PDF), process their content by chunking large files, and interact with that content through a conversational AI interface. The application uses OpenAI’s GPT-4 for generating responses and FAISS for efficient retrieval.
- File Upload: Supports both
.txt
and.pdf
files. - Multiple Document Handling: Upload and process multiple documents.
- PDF Parsing: Extracts text from PDFs using PyMuPDF with a fallback to PyPDF2.
- Efficient Chunking: Splits large documents into smaller, manageable pieces.
- Conversational Chat Interface: Chat interface for asking questions about the document content.
- Vector Search: Uses FAISS to index and retrieve document chunks.
- Docker & Devcontainer: Pre-configured Docker and VS Code Devcontainer setup for a consistent development environment.
RAG-Chatbot/
├── chatbot.py # Main Streamlit chatbot application
├── Dockerfile # Docker container configuration
├── docker-compose.yml # Docker Compose configuration
├── .devcontainer/ # VS Code Devcontainer configuration files
├── requirements.txt # Python dependencies
├── .env.example # Sample environment file (with placeholder values)
└── README.md # This documentation file
Before running the application, ensure you have the following installed:
- Docker (Docker Desktop must be running)
- VS Code with the Remote - Containers Extension
- Git
- Python 3.9+
- An OpenAI API Key
Open a terminal and run:
git clone https://github.com/esadkaoui/INFO5940.git
cd INFO5940
- Create a
.env
File
In the root directory, create a file named.env
:
touch .env
- Add Your Environment Variables
Edit the.env
file to include:
OPENAI_API_KEY=your-api-key-here
OPENAI_BASE_URL=https://api.openai.com
TZ=America/New_York
- Ensure Secrets Are Not Committed
Your real.env
file should be excluded by listing it in your.gitignore
. Use the provided.env.example
(with placeholder values) to guide users.
- Open VS Code and navigate to the project folder.
- Open the Command Palette (
Ctrl+Shift+P
orCmd+Shift+P
) and select:
Remote-Containers: Reopen in Container
- VS Code will build and open the project inside the container.
Once inside the container, open a terminal and run:
streamlit run chatbot.py
Then, open your browser and go to:
http://localhost:8501
- Ensure Docker Desktop is running.
- In your terminal, run:
docker-compose up --build
- Open your browser at the provided URL (typically
http://localhost:8501
).
- Uploading Documents:
- Click "Upload documents" and select one or more
.txt
or.pdf
files. - The application will process and index the files.
- Chat Interface:
- Enter your question in the chat input.
- The chatbot uses the uploaded document content to generate contextually relevant responses.
- The final aggregated answer is displayed as a single paragraph.
-
PyMuPDF Not Installed Warning:
If you see a warning about PyMuPDF (fitz) not being installed, install it by running:pip install pymupdf
-
OpenAI API Authentication Error (401):
Ensure your.env
file contains the correct API key and that all proxy environment variables are unset:unset HTTP_PROXY unset HTTPS_PROXY unset http_proxy unset https_proxy
Also, verify that
openai.proxy = None
is set in your code. -
Container Issues:
If the container fails to start:docker-compose down docker-compose up --build
Check the container logs for further details.