PDF OCR Converter

A Python script that batch processes PDF files using OCRmyPDF to make them searchable through Optical Character Recognition (OCR).

Description

This script automates the process of converting non-searchable PDF documents into searchable ones using OCR technology. It processes all PDF files in a specified input directory and saves the OCR-processed versions to an output directory.

Prerequisites

Python 3.x
OCRmyPDF (must be installed and accessible from command line)

Installation

Install OCRmyPDF:

# For Ubuntu/Debian
apt-get install ocrmypdf

# For macOS
brew install ocrmypdf

# For Windows
pip install ocrmypdf

Clone this repository or download the script.

Configuration

Modify the following variables in the script to match your environment:

input_folder = "C:\\root\\archive\\ocr_pend"  # Folder containing original PDFs
output_folder = "C:\\root\\archive"           # Folder for processed PDFs

Usage

Place your PDF files in the input folder
Run the script:
```
python pdf_to_ocr.py
```

The script will:

Process all PDF files in the input folder
Apply OCR to make them searchable
Save the processed files to the output folder
Print progress messages for each file

Features

Batch processing of multiple PDF files
Automatic creation of output directory if it doesn't exist
Error handling and progress reporting
Maintains original file names

Error Handling

The script includes basic error handling that will:

Print success messages for each successfully processed file
Print error messages if processing fails for any file
Continue processing remaining files even if one fails

Contributing

Feel free to submit issues and enhancement requests!

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Uses OCRmyPDF for PDF processing

⌨️ with 💻 by Raj Reddy
// Reach out if you find bugs in the matrix

// "Hello, World!" is just the beginning 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
pdf_to_ocr.txt		pdf_to_ocr.txt
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF OCR Converter

Description

Prerequisites

Installation

Configuration

Usage

Features

Error Handling

Contributing

License

Acknowledgments

About

Releases

Packages

neuralnet19/pdf_to_ocr

Folders and files

Latest commit

History

Repository files navigation

PDF OCR Converter

Description

Prerequisites

Installation

Configuration

Usage

Features

Error Handling

Contributing

License

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages