An interactive PDF reader powered by LangChain and GPT that enables users to upload PDF documents and chat with an AI assistant to extract insights, answer questions, and navigate content intelligently.
- PDF Upload: Upload any PDF document for interactive analysis
- AI-Powered Q&A: Ask questions about your PDF content and get intelligent answers
- Semantic Search: Uses vector embeddings to find relevant content accurately
- Conversational Memory: Maintains chat history for context-aware follow-up questions
- Answer-First PDF Display: Highlights the exact page containing the answer with 📍 indicator, followed by context pages
- Image-Based PDF Rendering: Reliable cross-platform PDF viewing that works on all deployment environments
- Smart Page Context: Automatically displays surrounding pages (±2 pages) for better understanding
- Conversational Interface: Natural chat experience powered by GPT-3.5/GPT-4
- Simple UI: Clean, intuitive interface built with Streamlit
- Modular Architecture: Well-organized codebase with separation of concerns for easy maintenance and extension
- Performance Optimized: Cached PDF-to-image conversion for faster repeated access
- LangChain - Framework for LLM application development
- Streamlit - Web application framework
- OpenAI GPT - Large language model for answer generation
- Chroma - Vector database for embeddings
- HuggingFace Transformers - Embedding models
- pdf2image - PDF to image conversion for reliable rendering
- Poppler - PDF rendering engine
- Python 3.12+ (Recommended). (Compatible with 3.10 – 3.13)
- OpenAI API Key
- HuggingFace API Token
- Poppler (system dependency for PDF rendering)
- Clone the repository
git clone https://github.com/sheygs/smart-pdf-reader.git
cd smart-pdf-reader- Create a virtual environment
python3.12 -m venv venv
source venv/bin/activate-
Install system dependencies
For PDF rendering support, install Poppler:
-
macOS:
brew install poppler
-
Linux (Ubuntu/Debian):
sudo apt-get update sudo apt-get install -y poppler-utils
-
Windows: Download from poppler releases and add to PATH
-
-
Install Python dependencies
pip install -r requirements.txt
-
Set up environment variables
Rename the
.env.examplefile to.envin the root directory and populate the required keys:OPENAI_API_KEY=your_openai_api_key_here HUGGINGFACEHUB_API_TOKEN=your_huggingface_api_token_here
- Start the application
streamlit run src/app.py-
Upload your PDF
- Click on the file uploader in the sidebar
- Select a PDF document from your local machine
-
Process the PDF
- Click the "Process" button to analyze the document
- Wait for the processing to complete
-
Ask questions
- Type your question in the chat input
- The AI will analyze the PDF and provide relevant answers
- The answer page will be displayed first with a 📍 indicator
- Context pages (±2 pages) will be shown below for additional context
smart-pdf-reader/
│
├── src/
│ ├── app.py # Main application entry point
│ ├── config.py # Configuration management
│ │
│ ├── core/ # Core business logic
│ │ ├── conversation.py # Conversation service (RAG chain)
│ │ ├── document_processor.py # PDF document processing
│ │ ├── embeddings.py # Embedding service
│ │ └── vector_store.py # Vector database operations
│ │
│ ├── ui/ # User interface components
│ │ ├── components.py # Chat and PDF components
│ │ ├── html_templates.py # HTML/CSS templates
│ │ ├── layout.py # Application layout
│ │ └── session.py # Session state management
│ │
│ └── utils/ # Utility functions
│ ├── file_handlers.py # File operations
│ └── pdf_renderer.py # PDF rendering utilities
│
├── requirements.txt # Python dependencies
├── .env.dev # Environment variables template
├── README.md # Project documentation
└── .gitignore
- PDF Processing: Uploaded PDFs are parsed and split into manageable chunks using PyPDF
- Embedding Creation: Text chunks are converted to vector embeddings using HuggingFace models (default:
thenlper/gte-small) - Vector Storage: Embeddings are stored in Chroma vector database for efficient similarity search
- Conversational RAG: Uses LangChain's retrieval chain with chat history awareness
- Query Processing: User questions are contextualized with chat history and matched against stored vectors
- Answer Generation: Relevant chunks are passed to GPT model with the contextualized question
- Answer-First Display:
- The page containing the answer is displayed first with a 📍 indicator
- Surrounding pages (±2 pages) are shown below for context
- PDF pages are converted to high-quality images (150 DPI) for reliable cross-platform rendering
- Caching ensures fast repeated access to the same pages
The project follows a modular architecture pattern with clear separation of concerns:
- Core Module (
src/core/): Business logic for document processing, embeddings, vector store, and conversation management - UI Module (
src/ui/): Streamlit interface components, layouts, and session management - Utils Module (
src/utils/): Reusable utilities for file handling and image-based PDF rendering - Config Module (
src/config.py): Centralized configuration and environment validation
- Image-Based PDF Rendering: Uses
pdf2imageinstead of iframe embedding for reliable cross-platform display - Answer-First UX: Displays the answer page prominently before showing context pages
- Cached Rendering: PDF-to-image conversion is cached using
@st.cache_datafor better performance - Configurable Context: Context window (pages before/after answer) is configurable via
src/config.py
You can customize the application behavior by modifying src/config.py:
@dataclass
class PDFConfig:
context_page_before: int = 2 # Pages to show before answer page
context_page_after: int = 2 # Pages to show after answer page
default_page: int = 0 # Default page to display
dpi: int = 150 # Image resolution for PDF rendering- File Format Support: Currently only supports PDF files. Support for other document formats (Word, TXT, etc.) is planned for future releases
- Internet Connection Required: Active internet connection needed for API calls to OpenAI and HuggingFace
- API Costs: OpenAI API usage incurs costs based on usage. Monitor your API usage to avoid unexpected charges
- PDF Size: Very large PDFs (100+ pages) may take longer to process and could impact performance
- Language Support: Best performance with English text. Other languages may work but have not been extensively tested
- Memory Usage: Processing large documents requires sufficient system memory. Close other applications if you experience slowdowns
Contributions are welcome! The project follows a modular architecture to make it easy to contribute:
- Core Features: Add new functionality in the
src/core/module - UI Improvements: Enhance the interface in the
src/ui/module - Utilities: Add helper functions in the
src/utils/module
- OpenAI for GPT models
- LangChain team for the excellent framework
- Streamlit for the intuitive web framework
- HuggingFace for open-source embedding models
MIT - see the LICENSE file for details.