Skip to content

will-strader/scherrer-bid-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Bid-Extractor

A Python tool to parse bid-information PDFs (e.g. “Delavan PL”) and populate a standard bid spreadsheet template.

Features

  • PDF/Text extraction using pdfplumber (with optional OCR for scanned files)
  • Regex & NLP–based field parsing
  • Pandas-driven template filling and Excel/CSV output
  • Command-line interface for batch processing

Repo Structure

bid-extractor/ ├── src/ # Core modules │ ├── parser.py # PDF → raw text │ ├── extractor.py # raw text → field dict │ ├── templater.py # dict → Excel/CSV │ └── cli.py # entry-point script ├── tests/ # Unit tests (pytest) ├── data/ # Example PDFs & templates ├── .gitignore └── README.md

Getting Started

  1. Clone & activate

    git clone git@github.com:<your-org>/bid-extractor.git
    cd bid-extractor
    python3 -m venv venv && source venv/bin/activate
    pip install -r requirements.txt
    
  2. Install dependencies pip install pdfplumber pandas openpyxl pytesseract spacy python -m spacy download en_core_web_sm

  3. Run the CLI python -m src.cli --input data/DelavanPL.pdf
    --template data/Bid\ information.xlsx
    --output filled.xlsx

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages