Skip to content

A Script to Analyze thousands of complex PDFs with text, tables, graphs and input them in a xls file within seconds.

Notifications You must be signed in to change notification settings

akifislam/Complex-PDF-MCQ-Scraper

Repository files navigation

Smart Data Entry Killer

A Python Script to Excelize Parsed Complex Text, Image, Tables from Bulk PDF
Explore »

data-extractor

About The Project

The main goal of this project is to input data on excel from some complex PDFs. A PDF is called complex if it contains multiple pages with various shapes and dimensions of tables, chemical images, drawings, diagrams etc.

(back to top)

Built With

  • beautifulsoup4==4.11.1
  • cryptography==37.0.4
  • html5lib==1.1
  • lxml==4.9.1
  • numpy==1.23.1
  • pandas==1.4.3
  • pdfminer.six==20220524
  • pdfplumber==0.7.4
  • Pillow==9.2.0
  • pipreqs==0.4.11
  • PyMuPDF==1.20.1
  • urllib3==1.26.11
  • Wand==0.6.9
  • xlrd==2.0.1

Prerequisites

You need Python 3.7 or more and Pip 20.0 or more for this project. I have used Python 3.9.13 and pip 22.2.1

Installation

Below is an example of how you can instruct your audience on installing and setting up your app. This template doesn't rely on any external dependencies or services.

  1. Get a free API Key at https://example.com
  2. Clone the repo
     git clone https://github.com/akifislam/SmartDataEntryKiller.git
  3. Install Dependencies
    pip install -r requirements.txt
  4. Run Script
    python3 BurstProcessor.py

(back to top)

Contact

Akif Islam - Akif Islam - [email protected]

Project Link: Smart Data Entry Killer

(back to top)

Special Thanks

  • Mohammad Ruhul Ameen Bhai for boosting me to complete this impossible tasks
  • StackOverFlow for saving my life and giving me recognition to outsiders as a Python Developer (though I know nothing about it)

-->

About

A Script to Analyze thousands of complex PDFs with text, tables, graphs and input them in a xls file within seconds.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages