Your Intelligent Medical Document Assistant powered by AI and RAG
MedDoc Flow is a sophisticated Streamlit application that enables healthcare professionals and researchers to interact intelligently with medical documents using advanced AI technology and Retrieval-Augmented Generation (RAG). Upload your medical PDFs and get instant, contextual answers through our conversational interface.
- π Multi-Document Processing: Upload and process multiple medical PDF and plain-text (.txt) documents simultaneously
- π€ AI-Powered Chat Interface: Ask questions about your documents using natural language
- π Semantic Search: Advanced document retrieval using FAISS vector embeddings
- π¬ Interactive UI: Clean, medical-themed interface with real-time chat experience
- π§ RAG-Powered Responses: Uses Retrieval-Augmented Generation to ground AI answers in your uploaded documents
- β‘ Fast Processing: Efficient text chunking and vector indexing for quick responses; embedding model cached across runs
- π± Responsive Design: Works seamlessly across different screen sizes
- π Document Statistics: See file and chunk counts after processing
- ποΈ Clear Chat History: Reset the conversation at any time from the sidebar
- πΎ Export Chat: Download the full conversation as a
.txtfile
- Python 3.8 or higher
- pip package manager
-
Clone the repository
git clone https://github.com/Karanpratap7/MedDoc-Flow.git cd MedDoc-Flow -
Install dependencies
pip install -r requirements.txt
-
Set up your API key
Preferred β set the environment variable before running the app:
export EURI_API_KEY="your_api_key_here"
Alternatively, update
app/config.pydirectly:EURI_API_KEY = "your_api_key_here"
-
Run the application
streamlit run main.py
-
Open your browser
- Navigate to
http://localhost:8501 - Start uploading your medical documents!
- Navigate to
- Use the sidebar to upload one or more documents
- Supported formats: PDF and plain-text (.txt) files
- Click "βοΈ Process Documents" to analyse and index your files
- The system will extract text and create searchable embeddings
- Document statistics (file count, chunk count) are shown after processing
- Use the chat interface to ask questions about your documents
- Examples:
- "What are the main symptoms described in patient 1?"
- "Summarise the treatment recommendations"
- "What medications were prescribed?"
- Receive contextual answers based on your uploaded documents
- The AI will clearly indicate if information is not available in your documents
- Clear Chat History β removes all messages from the current session
- Export Chat β downloads the full conversation as a
.txtfile
MedDoc-Flow/
βββ main.py # Main Streamlit application
βββ app/
β βββ __init__.py # Package initialization
β βββ ui.py # User interface components
β βββ pdf_utils.py # PDF and plain-text extraction utilities
β βββ vectorstore_utils.py # Vector database operations
β βββ chat_utils.py # AI chat model interactions
β βββ config.py # Configuration settings
βββ tests/
β βββ test_pdf_utils.py # Tests for PDF/text extraction
β βββ test_chat_utils.py # Tests for chat model utilities
β βββ test_vectorstore_utils.py # Tests for vector store utilities
βββ requirements.txt # Python dependencies
βββ README.md # This file
Yes β MedDoc Flow is built on RAG (Retrieval-Augmented Generation).
Instead of relying solely on the LLM's training data, the application retrieves relevant passages directly from your uploaded documents and provides them as context to the language model, resulting in accurate, document-grounded answers.
Document Upload (PDF / TXT)
β
βΌ
Text Extraction (PyPDF / built-in decoder)
β
βΌ
Text Chunking (RecursiveCharacterTextSplitter, chunk_size=1000, overlap=200)
β
βΌ
Embedding Generation (HuggingFace sentence-transformers/all-mpnet-base-v2) [cached]
β
βΌ
Vector Indexing (FAISS)
β
βΌ (at query time)
User Question βββΊ Semantic Similarity Search (top-3 chunks retrieved)
β
βΌ
Context Injection into LLM Prompt
β
βΌ
Euri AI LLM (gpt-4.1-nano) Generates Answer
β
βΌ
Response Displayed in Chat UI
| Component | Technology | Role |
|---|---|---|
| Document Loader | PyPDF / built-in | Extracts raw text from uploaded PDFs or .txt files |
| Text Splitter | LangChain RecursiveCharacterTextSplitter |
Splits text into overlapping chunks for retrieval |
| Embedding Model | sentence-transformers/all-mpnet-base-v2 |
Converts text chunks and queries into dense vectors (cached) |
| Vector Store | FAISS (faiss-cpu) | Stores embeddings and performs fast similarity search |
| Retriever | FAISS similarity_search (k=3) |
Retrieves the 3 most relevant chunks for each query |
| LLM | Euri AI gpt-4.1-nano |
Generates answers grounded in the retrieved context |
The application requires an Euri AI API key. The recommended approach is to set the EURI_API_KEY environment variable:
export EURI_API_KEY="your_euri_ai_api_key"Alternatively, update app/config.py:
EURI_API_KEY = "your_euri_ai_api_key"- Chunk Size: Modify
chunk_sizeinmain.pyfor different text processing granularity - Model Settings: Update embedding models in
vectorstore_utils.py - UI Theme: Customize colors and styling in the CSS section of
main.py
Run the unit test suite with:
pip install pytest
pytest -v tests/Tests cover:
- PDF and plain-text extraction (
test_pdf_utils.py) - Chat model validation and error handling (
test_chat_utils.py) - Vector store retrieval and backward-compatible alias (
test_vectorstore_utils.py)
- Frontend: Streamlit
- AI/ML:
- RAG Pipeline: LangChain orchestrates retrieval and generation
- FAISS for vector similarity search (retrieval)
- HuggingFace embeddings (sentence-transformers) for vectorisation
- Euri AI for LLM chat completions (generation)
- Document Processing: PyPDF for PDF text extraction; built-in decoder for
.txtfiles - Backend: Python 3.8+
Key dependencies include:
streamlit- Web application frameworkpypdf- PDF text extractionlangchain- AI application frameworklangchain_community- Community integrationsfaiss-cpu- Vector similarity searchsentence_transformers- Text embeddingseuriai- AI model access
We welcome contributions! Here's how you can help:
- Fork the repository
- Create a feature branch
git checkout -b feature/your-feature-name
- Make your changes
- Commit your changes
git commit -m "Add your feature description" - Push to your branch
git push origin feature/your-feature-name
- Open a Pull Request
- Follow PEP 8 style guidelines
- Add docstrings to new functions
- Test your changes with sample documents
- Update documentation as needed
Important: This application is designed for educational and research purposes. Always consult with qualified healthcare professionals for medical decisions. Do not use this tool as a substitute for professional medical advice, diagnosis, or treatment.
If you encounter any issues:
- Check the Issues: Browse existing GitHub Issues
- Create an Issue: Report bugs or request features
- Documentation: Review this README and inline code comments
Future enhancements planned:
- Support for DOCX documents
- Multi-language support
- Advanced document analytics
- User authentication and document management
- Export conversation history
- Integration with more AI models
Karan Pratap
- GitHub: @Karanpratap7
Paarth Yadav
- GitHub: @PaarthYadav
- LangChain community for excellent documentation
- Streamlit team for the amazing framework
- HuggingFace for providing pre-trained models
- The open-source community for various libraries used
π©Ί MedDoc Flow - Bridging AI and Medical Documentation