LLM-based Custom PDF Chatbot is a chatbot application that utilizes a Language Model (LLM) to process and interact with custom PDF files. The chatbot is designed to extract information, answer questions, and provide assistance based on the content of PDF documents.
- Indexing:
- Pipeline: A pipeline for ingesting data from a source and indexing it. This usually happens offline.
- Common Sequence:
- Load: First, we need to load our data. We’ll use DocumentLoaders for this.
- Split: Text splitters break large Documents into smaller chunks. This is useful both for indexing data and for passing it into a model, since large chunks are harder to search over and won’t fit in a model’s finite context window.
- Store: We need somewhere to store and index our splits so that they can later be searched over. This is often done using a VectorStore and Embeddings model.
- Retrieval and Generation:
- RAG Chain: The actual RAG chain takes the user query at runtime and retrieves the relevant data from the index, then passes that to the model.
- Common Sequence:
- Retrieve: Given a user input, relevant splits are retrieved from storage using a Retriever.
- Generate: A ChatModel / LLM produces an answer using a prompt that includes the question and the retrieved data.
This project was developed as an entry for the Streamlit Hackathon in September 2023.
-
Clone the repository:
git clone https://github.com/deepak7376/DocChatAI
-
Navigate to the project directory:
cd DocChatAI
-
Install dependencies:
pip install -r requirements.txt
-
Run the application:
python app.py
-
Open your web browser and navigate to
http://localhost:8000
to interact with the chatbot.
- 🌐 Website: Portfolio
- 💬 Discord: Join the Community
- 💼 LinkedIn: Deepak Yadav
This project is licensed under the MIT License.