🧠 OCR Project: Multi-Model Optical Character Recognition Pipeline

This project implements a robust three-stage OCR pipeline that evolves from traditional text extraction to an intelligent hybrid system using deep learning. It’s designed for structured document processing and is ideal for scanned PDFs, images, and forms.

All models run within a Python virtual environment for isolation and reproducibility.

📌 Models Overview

✅ Model 1: Tesseract OCR CLI Tool

A simple command-line tool using pytesseract for text extraction.

Extracts plain text from images or PDFs
Lightweight and easy to run

✅ Model 2: Tesseract Integrated with Flask Web App

A user-friendly web interface built with Flask that uses Tesseract OCR.

Upload PDF or image files
Web-based interaction
Real-time OCR results on browser

✅ Model 3: Hybrid TrOCR + Tesseract System

A high-performance OCR system that combines Microsoft's TrOCR (transformers) with Tesseract for layout and accuracy.

Transformer-based deep learning OCR
Hybrid inference with fallback/combination strategy
Best suited for scanned documents, handwritten text, or low-quality images

🧰 Environment Setup (For All Models)

🔄 Create Virtual Environment

python -m venv venv
# Activate the environment
source venv/bin/activate        # On macOS/Linux
venv\Scripts\activate           # On Windows

📦 Install Dependencies

Use this common requirements.txt (add other model-specific ones if needed):

flask
pytesseract
pdf2image
pillow
transformers
torch

Install via:

pip install -r requirements.txt

⚙️ Model-Specific Details

🚀 Model 1: Tesseract OCR CLI

🛠️ Dependencies

pytesseract
Pillow
Tesseract-OCR installed on system

💻 Run

python model1_tesseract_ocr.py

📥 Output

Extracted text is printed in the terminal or saved to a .txt file.

🌐 Model 2: Flask + Tesseract Web App

🛠️ Dependencies

flask
pytesseract
pdf2image
Pillow
Werkzeug (comes with Flask)

💻 Run

python app.py

Open http://localhost:5000 to upload documents and view OCR results.

📂 File Structure (Flask)

templates/
  └── index.html
static/
  └── style.css
uploads/
  └── (temporary user uploads)

🤖 Model 3: TrOCR + Tesseract Hybrid

🛠️ Dependencies

transformers
torch
pytesseract
pdf2image
Pillow

💻 Run

python model3_hybrid_ocr.py

📌 Features

Uses TrOCR model from Hugging Face (microsoft/trocr-base-stage1)
Performs multi-pass OCR for better accuracy
Can process multiple page documents

🔧 Installing Tesseract OCR Engine

🪟 Windows

Download installer from:
https://github.com/tesseract-ocr/tesseract

Add Tesseract path to environment variables, e.g.:

C:\Program Files\Tesseract-OCR\tesseract.exe

🐧 Linux (Ubuntu)

sudo apt update
sudo apt install tesseract-ocr

🍎 macOS

brew install tesseract

📁 Recommended Folder Structure

OCR-Project/
├── venv/
├── requirements.txt
├── model1_tesseract_ocr.py
├── model3_hybrid_ocr.py
├── app.py
├── templates/
├── static/
├── uploads/
└── README.md

📄 Sample Output

Model 1: Outputs plain text
Model 2: Web preview + downloadable text
Model 3: Enhanced text output, may include layout-aware content

⚠️ Notes

Always activate the virtual environment before running any model.
Model 3 requires internet on first run (to download TrOCR).
GPU usage (if available) can speed up TrOCR model inference.
Make sure Tesseract is correctly installed and its path is set.

🤝 Contributions

Pull requests are welcome! Suggestions for layout-aware models, table extraction, or handwriting recognition modules are highly appreciated.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
OCR model 1/DRDO-main		OCR model 1/DRDO-main
OCR model 2 & 3		OCR model 2 & 3
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 OCR Project: Multi-Model Optical Character Recognition Pipeline

📌 Models Overview

✅ Model 1: Tesseract OCR CLI Tool

✅ Model 2: Tesseract Integrated with Flask Web App

✅ Model 3: Hybrid TrOCR + Tesseract System

🧰 Environment Setup (For All Models)

🔄 Create Virtual Environment

📦 Install Dependencies

⚙️ Model-Specific Details

🚀 Model 1: Tesseract OCR CLI

🛠️ Dependencies

💻 Run

📥 Output

🌐 Model 2: Flask + Tesseract Web App

🛠️ Dependencies

💻 Run

📂 File Structure (Flask)

🤖 Model 3: TrOCR + Tesseract Hybrid

🛠️ Dependencies

💻 Run

📌 Features

🔧 Installing Tesseract OCR Engine

🪟 Windows

🐧 Linux (Ubuntu)

🍎 macOS

📁 Recommended Folder Structure

📄 Sample Output

⚠️ Notes

🤝 Contributions

About

Uh oh!

Releases

Packages

Languages

primusorion/DLRL-Project-

Folders and files

Latest commit

History

Repository files navigation

🧠 OCR Project: Multi-Model Optical Character Recognition Pipeline

📌 Models Overview

✅ Model 1: Tesseract OCR CLI Tool

✅ Model 2: Tesseract Integrated with Flask Web App

✅ Model 3: Hybrid TrOCR + Tesseract System

🧰 Environment Setup (For All Models)

🔄 Create Virtual Environment

📦 Install Dependencies

⚙️ Model-Specific Details

🚀 Model 1: Tesseract OCR CLI

🛠️ Dependencies

💻 Run

📥 Output

🌐 Model 2: Flask + Tesseract Web App

🛠️ Dependencies

💻 Run

📂 File Structure (Flask)

🤖 Model 3: TrOCR + Tesseract Hybrid

🛠️ Dependencies

💻 Run

📌 Features

🔧 Installing Tesseract OCR Engine

🪟 Windows

🐧 Linux (Ubuntu)

🍎 macOS

📁 Recommended Folder Structure

📄 Sample Output

⚠️ Notes

🤝 Contributions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages