🧠 PDF Report Summarizer

This is a Streamlit-based application that allows users to upload PDF reports and generate concise, structured summaries using the LLaMA3 model through Ollama.

🔧 Features

📄 Upload and OCR-based text extraction from PDFs
🧩 Chunked summarization and final summary synthesis
🎯 Selectable report types: MPR, Board Report, Financial, etc.
📊 Page count estimation and time-to-completion display
🎛️ Custom or auto-limit on final summary page size
🟢 Progress bar with real-time chunk status
⛔ Stop processing at any point
🖨️ Copy/print support and styled HTML preview/download
🧠 Final summary always retained unless reset manually
🎛️ Final summaries are generated using user-selected prompt templates.
🧩 You can toggle chunk-level results and constrain length using a page selector.
⛔ Automatically prevents system sleep during long runs.
🎛️ Summary remains until a new file is uploaded or reset.

🧠 Model Used

Model: llama3:8b-instruct-q4_K_M
Manager: Ollama
ollama pull llama3:8b-instruct-q4_K_M
Auto-launched via subprocess if not already running

📦 Requirements

Python 3.8+
Tesseract-OCR (Installed and added to PATH)
Poppler for PDF-to-image conversion (on Windows)

⚠️ Make sure you manually configure:

pytesseract.pytesseract.tesseract_cmd in summarizer_pipeline.py

poppler_bin path in summarizer_pipeline.py

✅ Dependencies (requirements.txt)

UI & Web App

streamlit
markdown

PDF Processing

PyMuPDF==1.23.21 # used as fitz to extract text from PDFs
pdf2image==1.16.3 # converts PDF pages to images for OCR

OCR

pytesseract==0.3.10 # OCR engine (must install Tesseract binary separately)

HTTP Requests

requests

pip install -r requirements.txt

🔧 External Dependencies

These tools must be installed separately and added to your system PATH.

📌 1. Tesseract OCR Used for extracting text from scanned PDF images.
Tesseract-OCR (Installed and added to PATH)
📌 2. Poppler for Windows Used by pdf2image to convert PDF pages into images.
Poppler for PDF-to-image conversion

🏁 How to Run

Clone the Repository

git clone https://github.com/rychrr/pdf_summarizer_offline.git

Install Python Dependencies
```
  pip install -r requirements.txt
```
Launch Streamlit App
```
 streamlit run summarizers.py
```

📁 Project Structure

pdf-summarizer/ ├── summarizers.py # Main Streamlit frontend app ├── summarizer_pipeline.py # Backend summarization pipeline ├── prompts/ # Prompt templates │ ├── mpr_prompt.txt │ ├── final_mpr_prompt.txt │ ├── board_prompt.txt │ └── final_board_prompt.txt ├── cache/ # OCR cache directory ├── img/ │ └── yourimg.png # Optional logo for styled summary ├── requirements.txt └── README.md

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
poppler-24.08.0		poppler-24.08.0
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
summarizer_pipeline.py		summarizer_pipeline.py
summarizers.py		summarizers.py
unit_tests.py		unit_tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 PDF Report Summarizer

This is a Streamlit-based application that allows users to upload PDF reports and generate concise, structured summaries using the LLaMA3 model through Ollama.

🔧 Features

🧠 Model Used

📦 Requirements

✅ Dependencies (requirements.txt)

🔧 External Dependencies

🏁 How to Run

📁 Project Structure

👨‍💻 Author - Ejike Ozonkwo

🤝 Contributing

🧾 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 PDF Report Summarizer

This is a Streamlit-based application that allows users to upload PDF reports and generate concise, structured summaries using the LLaMA3 model through Ollama.

🔧 Features

🧠 Model Used

📦 Requirements

✅ Dependencies (requirements.txt)

🔧 External Dependencies

🏁 How to Run

📁 Project Structure

👨‍💻 Author - Ejike Ozonkwo

🤝 Contributing

🧾 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages