Bid-Extractor

A Python tool to parse bid-information PDFs (e.g. “Delavan PL”) and populate a standard bid spreadsheet template.

Features

PDF/Text extraction using pdfplumber (with optional OCR for scanned files)
Regex & NLP–based field parsing
Pandas-driven template filling and Excel/CSV output
Command-line interface for batch processing

Repo Structure

bid-extractor/ ├── src/ # Core modules │ ├── parser.py # PDF → raw text │ ├── extractor.py # raw text → field dict │ ├── templater.py # dict → Excel/CSV │ └── cli.py # entry-point script ├── tests/ # Unit tests (pytest) ├── data/ # Example PDFs & templates ├── .gitignore └── README.md

Getting Started

Clone & activate

git clone git@github.com:<your-org>/bid-extractor.git
cd bid-extractor
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt

Install dependencies pip install pdfplumber pandas openpyxl pytesseract spacy python -m spacy download en_core_web_sm
Run the CLI python -m src.cli --input data/DelavanPL.pdf
--template data/Bid\ information.xlsx
--output filled.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bid-Extractor

Features

Repo Structure

Getting Started

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

will-strader/scherrer-bid-extractor

Folders and files

Latest commit

History

Repository files navigation

Bid-Extractor

Features

Repo Structure

Getting Started

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages