Skip to content

ben05allen/OCR_PDF_Scanner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Script to read scanned PDFs

Using the magic of tesseract to extract text from PDFs which weren't machine written

Requirements (Ubuntu 2x.04)

pdf2image needs uv, poppler-utils and Tesseract-ocr installed;

  • curl -LsSf https://astral.sh/uv/install.sh | sh
  • sudo apt install poppler-utils tesseract-ocr

Usage

uv run scanner <folder with pdf docs>

About

Python script to scan PDF's which are not machine written

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages