A powerful Retrieval-Augmented Generation (RAG) chatbot that answers questions based on your documents using OpenAI embeddings and comprehensive logging capabilities.
- AI-Powered Q&A: Get intelligent answers from your documents
- OpenAI Integration: Uses state-of-the-art OpenAI embeddings and language models
- Comprehensive Logging: Detailed logging with multiple log files for monitoring
- Colored Terminal Output: Beautiful, color-coded responses and status messages
- Similarity Scoring: Intelligent document retrieval with relevance scores
- Source Attribution: Always shows which documents were used for answers
- Error Handling: Robust error handling with detailed error messages
- Multiple Document Formats: Supports PDF, Markdown, and other formats
- Python 3.8 or higher
- pip (Python package installer)
- OpenAI API key (for embeddings and language model)
# If using git:
git clone <your-repository-url>
cd RAG-Chatbot
# Or download and extract the ZIP file to your desired location# Create a virtual environment
python -m venv rag_env
# Activate the virtual environment
# On Windows:
rag_env\Scripts\activate
# On macOS/Linux:
source rag_env/bin/activate# Install all required packages
pip install -r requirements.txtNote: This will install all dependencies including:
langchainand related packages for RAG functionalityopenaifor API integrationchromadbfor vector databaselogurufor advanced loggingcoloramafor colored terminal outputsentence-transformersfor local embeddings- And many other supporting packages
Create a .env file in the project root:
# On Windows:
echo OPENAI_API_KEY=your_api_key_here > .env
# On macOS/Linux:
echo "OPENAI_API_KEY=your_api_key_here" > .envOr manually create .env file with:
OPENAI_API_KEY=your_openai_api_key_here# Test logging functionality
python test_logging.pyYou should see colored log output indicating successful setup.
First, you need to create a vector database from your documents:
# Create database from sample data
python openai_create_database.pyThis will:
- Load documents from
./sample_data/ - Split them into chunks
- Create embeddings using OpenAI
- Store them in
./chroma_db_openai/
Expected Output:
2025-06-29 18:XX:XX | INFO | === Starting OpenAI RAG Chatbot Database Creation ===
2025-06-29 18:XX:XX | INFO | Loading documents from directory: ./sample_data/aws_lambda
2025-06-29 18:XX:XX | INFO | Starting document chunking process
2025-06-29 18:XX:XX | INFO | Starting Chroma database creation
2025-06-29 18:XX:XX | INFO | Saved X chunks to ./chroma_db_openai.
2025-06-29 18:XX:XX | INFO | === Database Creation Completed Successfully ===
Now you can ask questions about your documents:
# Ask a question
python openai_query_data.py "What is AWS Lambda?"Example Queries for AWS Lambda:
python openai_query_data.py "How do I create my first Lambda function?"
python openai_query_data.py "What are the benefits of serverless computing?"
python openai_query_data.py "How does Lambda handle concurrent executions?"
python openai_query_data.py "What security best practices should I follow?"Example Queries for Alice in Wonderland:
python openai_query_data.py "What is Alice's adventure about?"
python openai_query_data.py "Who is the White Rabbit?"
python openai_query_data.py "What happens when Alice drinks the potion?"The application provides rich, colored output:
Search Results:
1. Score: 0.856 - AWS Lambda is a serverless compute service that runs your code...
2. Score: 0.743 - Lambda functions are event-driven and automatically scale...
3. Score: 0.689 - You can use Lambda to run code without provisioning servers...
================================================================================
π€ AI RESPONSE
================================================================================
AWS Lambda is a serverless compute service that allows you to run code without
provisioning or managing servers. It automatically scales your applications and
you only pay for the compute time you consume.
--------------------------------------------------------------------------------
π Sources:
1. sample_data/aws_lambda/lambda-dg.pdf
2. sample_data/aws_lambda/lambda-dg.pdf
================================================================================
The application uses a sophisticated logging system with multiple log files:
logs/app.log: General application logslogs/requests.log: Request trackinglogs/responses.log: Response trackinglogs/payloads.log: Detailed payload informationlogs/errors.log: Error tracking
- Database Path:
./chroma_db_openai/ - Sample Data:
./sample_data/
To use your own documents:
- Add your documents to the
sample_data/directory - Update the DATA_PATH in
openai_create_database.py:DATA_PATH = "./sample_data/your_documents"
- Run the database creation script again:
python openai_create_database.py
Error: No API key found. Please set OPENAI_API_KEY environment variable.
Solution:
- Add your OpenAI API key to the
.envfile - Make sure the
.envfile is in the project root directory
Error: Database directory not found
Solution:
- Run the database creation script first:
python openai_create_database.py - Check if
chroma_db_openai/directory exists
Warning: Best match has low similarity score: 0.45
Solutions:
- Try rephrasing your question
- Add more relevant documents to the database
- Lower the similarity threshold in the code (currently 0.7)
ModuleNotFoundError: No module named 'langchain'
Solution:
- Install dependencies:
pip install -r requirements.txt - Make sure your virtual environment is activated
PermissionError: [Errno 13] Permission denied
Solution:
- Run as administrator (Windows)
- Check file permissions
- Make sure you have write access to the project directory
Enable debug logging by modifying logger_config.py:
# Change log level to DEBUG
logger.add(sys.stdout, level="DEBUG")- API Keys: Never commit your
.envfile to version control - Log Files: Review log files for sensitive information
- Network: Ensure secure connections when using OpenAI API
- Documents: Be careful with sensitive documents in your database
- Increase chunk size for better context
- Adjust overlap for better document coverage
- Use batch processing for large files
- Monitor memory usage during database creation
- Reduce k value in similarity search (currently 3)
- Optimize prompt length
- Cache frequently asked questions
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- LangChain for the RAG framework
- OpenAI for the embedding and language models
- Chroma for the vector database
- Loguru for advanced logging capabilities
If you encounter any issues:
- Check the troubleshooting section above
- Review the log files in the
logs/directory - Create an issue in the repository
- Contact the development team
Happy Querying! π