ScholarOCR

ScholarOCR is an open-source Python tool designed to extract text from images and PDFs. It utilizes Tesseract OCR to convert study materials into editable text, helping students organize their notes efficiently.

Features

Automatic text extraction from images (.png, .jpg, .jpeg) and PDFs.
Automatic directory management for input and output files.
Optimized for Linux (Ubuntu) environments.
Simple codebase, easy to extend or modify.

Installation

Clone the repository:

git clone [https://github.com/evedmills/scholar-ocr.git](https://github.com/evedmills/scholar-ocr.git)
cd scholar-ocr

` 2. Install the Python dependencies:

p install -r REQUIREMENTS.TXT

Prerequisites

Since this tool relies on Tesseract OCR, you must install the engine on your system before running the Python script.

For Ubuntu / Debian:

sudo apt update
sudo apt install tesseract-ocr tesseract-ocr-por libtesseract-dev

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
REQUIREMENTS.TXT		REQUIREMENTS.TXT
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScholarOCR

Features

Installation

Prerequisites

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

evedmills/scholar-ocr

Folders and files

Latest commit

History

Repository files navigation

ScholarOCR

Features

Installation

Prerequisites

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages