The Cost Estimate Generator ingests historical pay-item pricing data, computes summary statistics, and updates estimate workbooks and audit CSV files in place. The project ships with synthetic sample data that demonstrate the expected file layout and allow the pipeline to be exercised end-to-end without external services.
- Python: 3.11 recommended for full reproducibility; 3.12+ is supported by the project via conditional dependencies.
- pip: Python package installer (included with Python)
The following packages are installed via requirements.txt or pyproject.toml:
numpy==1.26.4- Numerical computingpandas- Data analysis and manipulation- Python < 3.12:
pandas==1.5.3 - Python >= 3.12:
pandas>=2.2.2,<3.0.0
- Python < 3.12:
openpyxl==3.1.2- Excel file reading/writingpython-dotenv==1.0.0- Environment variable managementxlrd==2.0.1- Legacy Excel file readingopenai>=1.0.0,<2.0.0- AI assistance (optional, can be disabled)reportlab>=4.0.0,<5.0.0- PDF generationpypdf>=3.1.0,<5.0.0- PDF parsing/manipulationjsonschema>=4.19.0,<5.0.0- Validation of memo summary payloads
- OpenAI API: To enable AI-assisted item mapping, you need:
- An OpenAI API key (set via
OPENAI_API_KEY, or stored inAPI_KEY/API_KEY.txt; alternatively point to a specific file withOPENAI_API_KEY_FILE)- Can be disabled with
DISABLE_OPENAI=1environment variable or--disable-aiflag API_KEY/is intentionally excluded from version control; create it locally per machine and keep secrets there.
- Can be disabled with
- Reads historical price data from Excel workbooks (sheet-per-item) and from directories of CSV files, aggregating every matching source for a pay item.
- Computes
DATA_POINTS_USED,MEAN_UNIT_PRICE,STD_DEV,COEF_VAR, and a confidence score per pay item using the formulaconfidence = (1 - exp(-n/30)) * (1 / (1 + cv)). - Updates
Estimate_Draft.xlsxby prepending aSORT_CODEcolumn (10, 20, 30, ...) and inserting aCONFIDENCEcolumn immediately afterDATA_POINTS_USEDwithin theEstimatesheet. - Updates
Estimate_Audit.csvby insertingSTD_DEVandCOEF_VARcolumns afterDATA_POINTS_USEDand populating them for every row. - Provides an on-demand XML export (
outputs/xml/<project_id>.xmlby default, override viaXML_OUTPUT_DIR) mirroring theEstimate_Draft.xlsxrows so the pricing can be ingested by downstream INDOT tools via the GUI's "Create XML File" card or thegenerate_xml_from_estimate_workbookhelper. - Produces a debug mapping report at
outputs/payitem_mapping_debug.csvlisting any DM 23-21 remappings (source_item,mapped_item,mapping_rule,adder_applied,evidence). - Supports
--dry-runmode and optional AI assistance that can be disabled via CLI flags or theDISABLE_OPENAI=1environment variable. - Automates retrieval of INDOT Active Design Memos, producing structured
summaries under
references/memos/processed/and Markdown digests inreferences/memos/digests/to highlight pay-item updates for review. - Validates processed memo JSON against
references/memos/schema/processed.schema.jsonso downstream tooling receives consistent metadata. - Supports optional failure alerts when the memo ingest CI workflow fails
(
notification.enabled_on_failure) and standardises retry/backoff behaviour for HTTP, SMTP, and IMAP integrations. - Allows design memo rollup mappings to be extended at runtime via
references/memos/mappings/design_memo_mappings.csvwithout modifying the bundled defaults.
Items that lack usable bid history now receive conservative prices from non-geometry fallbacks:
- Unit Price Summary (CY2024) – if a weighted average exists with at least three supporting contracts, the pipeline adjusts the summary price for recency (STATE 12M vs. 24/36M) and region (DIST vs. STATE) before clamping it to the published low/high range.
- Design memo rollups – when summary support is thin or missing, the
replacement code can inherit data from its obsolete counterparts. Static
mappings live in
src/costest/design_memos.py(e.g., DM 25-10 pooling401-10258/401-10259into401-11526) and can be supplemented with additional rows inreferences/memos/mappings/design_memo_mappings.csv(columns:memo_id,effective_date,replacement_code,obsolete_code). Static entries win on conflicts to preserve legacy behaviour.
Each fallback sets SOURCE, DATA_POINTS_USED, and detailed NOTES so the
Excel and CSV outputs clearly explain how the estimate was derived. The
existing geometry-based alternate seek continues to operate unchanged and only
activates when both category pricing and the new fallbacks provide no data.
Place the project-level spreadsheets in data_sample/ (or pass explicit paths via
--quantities-xlsx and --region-map):
*_project_quantities.xlsxlists the pay items included in the job.references/region_map.xlsxdefines the DISTRICT → REGION mapping used to normalise BidTabs history. The GUI controls the Estimated Total Contract Cost (ETCC) and the project district/region directly, so no separateproject_attributes.xlsxis used anymore.data-sample/BidTabsData/holds historical bid tab exports (legacy.xlsfiles) that supply the price history used when computing statistics. Populate it viascripts/fetch_bidtabsdata.py(see below).
Historical bid tab data is downloaded from the shared
derek-betz/BidTabsData repository.
The fetch script installs the release asset into data-sample/BidTabsData/ and
writes a .bidtabsdata_version marker.
To download BidTabsData:
BIDTABSDATA_VERSION=v2025-12-26 python scripts/fetch_bidtabsdata.pyYou can also provide the version via CLI:
python scripts/fetch_bidtabsdata.py --version v2025-12-26Optional environment overrides:
BIDTABSDATA_REPOto override the upstream repo (derek-betz/BidTabsData)BIDTABSDATA_OUT_DIRto change the destination directoryBIDTABSDATA_SHA256to verify the downloaded archive
Note: The data-sample/BidTabsData/ directory is excluded from version control.
Always use the fetch script to download the data.
When present, the CLI automatically loads these files and attaches the metadata to the mapping report.
Use a project-local virtual environment so everyone uses the same interpreter and dependencies.
Windows bootstrap (installs Python, dev deps, and runs tests):
powershell -ExecutionPolicy Bypass -File scripts/bootstrap.ps1Windows (PowerShell):
# From the repo root
py -3.11 -m venv .venv
.\.venv\Scripts\Activate.ps1
# Developer install (preferred): uses pyproject.toml with conditional deps
python -m pip install -U pip setuptools wheel
pip install -e .[dev]
# Or: strict pins from requirements.txt (also supports conditionals)
# pip install -r requirements.txtmacOS/Linux (bash or zsh):
python3.11 -m venv .venv || python3 -m venv .venv
source .venv/bin/activate
# Developer install (preferred)
python -m pip install -U pip setuptools wheel
pip install -e .[dev]
# Or strict pins
# pip install -r requirements.txtVS Code:
- The workspace pins the interpreter to
.venv/via.vscode/settings.json. If VS Code prompts, choose the interpreter at.venv\Scripts\python.exe. - Ensure "Python: Terminal › Activate Environment" is enabled so new terminals show
(.venv).
Verify:
python -c "import sys; print(sys.version); print(sys.executable)"
python scripts/run_tests.py -q --collect-onlyNotes:
- Python 3.11 gives fully reproducible installs with the pinned dependencies. Python 3.12+ is supported via
conditional dependencies (pandas >= 2.2.2). If you must use 3.12+, prefer
pip install -e .[dev]or use the updatedrequirements.txtwhich includes environment markers. - A helper script is available on Windows:
scripts/setup_venv.ps1(see script header for options).
# After creating and activating the venv (see above):
pip install -e .[dev]Use pip install -e . if you only need the runtime dependencies or pip install -r requirements.txt to install the pinned production set without development tooling.
Run costest --help to see the full command-line interface.
For a lightweight desktop launcher run:
costest-guiAn application window opens where you can drag and drop the
*_project_quantities.xlsx workbook. The estimator pipeline executes using
that workbook and writes the results to the standard output locations.
On Windows the launcher automatically installs tkinterdnd2 into the active
virtual environment (unless DISABLE_TKINTERDND2_BOOTSTRAP=1 is set) so the
drag-and-drop cards work without extra steps. Other platforms can install it
manually with pip install tkinterdnd2; when unavailable the cards fall back
to a "Browse" button for manual file selection.
- Modern dark theme with improved contrast and readable typography.
- Sidebar collects project inputs and advanced settings:
- Aggregation method (Weighted Average, Trimmed Mean P10-P90, Robust Median)
- Memo minimum confidence slider with improved styling
- Quantity elasticity toggle
- Cleaner header: “How It Works” moved to the upper-right.
- Primary actions stacked: “Run Estimate” above “Clear Last Result”.
- Wider default window on startup (+15%).
- Project Workbook panel is wider by default and enforces a sensible minimum width for better readability.
- Run Log can be popped out into its own scrollable window via the “Pop out” button for easier reading.
The launcher prompts for the Expected Total Contract Cost (currency field with a
leading $) and the Project District (drop-down listing the six INDOT
districts). These entries replace the previous requirement to populate
project_attributes.xlsx; the region mapping is now provided by
references/region_map.xlsx.
Run the web application to share the estimator across the company intranet:
costest-web --host 0.0.0.0 --port 8080Open http://<server-host>:8080 in a browser. The web interface mirrors the
desktop GUI layout, supports drag-and-drop workbook uploads, streams the run
log, and exposes output downloads per run under outputs/web_runs/<run_id>/.
For production use, place the app behind an internal reverse proxy and
authentication layer.
Generate fresh sample output files from the text templates:
python scripts/prepare_sample_outputs.pyThe script copies the CSV audit sample and materialises Excel workbooks from
data_sample/Estimate_Draft_template.csv and
data_sample/payitems_workbook.json into the outputs/ directory. The sample
project spreadsheets data_sample/2300946_project_quantities.xlsx and
references/region_map.xlsx provides the district-to-region mapping for both
GUI and CLI runs. The GUI captures ETCC and district interactively.
With those files in place, run the pipeline against the samples:
costest \
--payitems-workbook outputs/PayItems_Audit.xlsx \
--estimate-audit-csv outputs/Estimate_Audit.csv \
--estimate-xlsx outputs/Estimate_Draft.xlsxOverride any input via the matching CLI flags (for example,
--project-quantities data_sample/2300946_project_quantities.xlsx).
When run without explicit paths the CLI looks for outputs/PayItems_Audit.xlsx
and falls back to the bundled sample workbook or to a data_in/ directory if
present. Supply --mapping-debug-csv to write the mapping report to a custom
location.
DISABLE_OPENAI=1 is respected automatically; set it (or use the
--disable-ai flag) when running offline. If AI assistance is desired, provide
an API key via the OPENAI_API_KEY environment variable or by storing it in
API_KEY/API_KEY.txt and omitting the disable flag.
A convenience wrapper is available:
python scripts/run_pipeline.py --helpINDOT Design Memo 23-21 introduces new HMA pay item numbers that supersede the legacy PG binder-based codes. The estimator supports this transition by:
- Loading the memo crosswalk from
data_reference/hma_crosswalk_dm23_21.csv(excludes SMA entries marked as deleted). Each row records the legacy pay item, its new MSCR counterpart, the mix course (Surface/Intermediate/Base), ESAL category, and binder class. - Remapping historical BidTabs records and project quantities to the new item
numbers whenever
--apply-dm23-21(orAPPLY_DM23_21=1) is provided. SMA items flagged as deleted are skipped automatically and listed in the logs. - Annotating estimate rows with
MappedFromOldItem, mix metadata, and a transitional adder flag. The mapping debug report (payitem_mapping_debug.csv) capturessource_item,mapped_item,mapping_rule,adder_applied, and anevidencecolumn fixed to "DM 23-21" for traceability. - Applying transitional adders of $3.00/ton (Surface), $2.50/ton (Intermediate), or $2.00/ton (Base) whenever DM 23-21 logic is enabled but the new item lacks sufficient history. Adders are automatically removed once the minimum sample target is satisfied.
To enable the new behaviour in CLI runs, pass --apply-dm23-21 (or export the
environment variable APPLY_DM23_21=1). The graphical launcher respects the
same environment variable.
Run the automated test suite with:
python scripts/run_tests.py -qContinuous integration runs the same command on every push via GitHub Actions.
CostEstimateGenerator/
+-- src/costest/ # Library code
+-- data_sample/ # Synthetic sample inputs
+-- data-sample/BidTabsData/ # Downloaded BidTabsData releases (ignored)
+-- outputs/ # Target directory for generated outputs
+-- scripts/run_pipeline.py # CLI wrapper
+-- tests/ # Pytest-based unit and integration tests
+-- requirements.txt # Reproducible dependency pins
+-- pyproject.toml # Packaging metadata
The project is designed to be idempotent: running the pipeline multiple times with the same inputs produces consistent outputs.
When estimating unit prices, the pipeline uses the following tiers in order:
- Historical category mix (BidTabs):
- District and statewide windows (12/24/36 months) aggregated using the configured method.
- Design memo rollup:
- Uses officially replaced/rolled-up item codes to form a pooled set and applies the same adjustments.
- Unit Price Summary (UPS):
- Falls back to the statewide weighted average for the specific item when sufficient UPS contracts exist.
- NO_DATA:
- If none of the above tiers apply, the item remains with a $0.00 placeholder and a review note.
Notes and metrics:
- Fallback tiers annotate SOURCE (e.g.,
DESIGN_MEMO_ROLLUP,UNIT_PRICE_SUMMARY) and add details in NOTES. - Confidence is computed in the exports to help triage low-data items.
- Quantity window and sigma trimming thresholds are configurable via
MEMO_ROLLUP_QUANTITY_LOWER,MEMO_ROLLUP_QUANTITY_UPPER, andMEMO_ROLLUP_SIGMA_THRESHOLDenvironment variables (defaults remain 0.5/1.5 and ±2σ respectively).
Configuration toggles:
- Alternate-seek toggle (geometry-based backfill):
- Environment:
DISABLE_ALT_SEEK=1to disable. - GUI: “Enable alternate seek backfill” checkbox.
- Environment:
Dashboard / summary:
- The run summary now includes a count of items priced via alternates.
The CI workflow automatically fetches the required version of BidTabsData before running tests. The version is controlled via the BIDTABSDATA_VERSION environment variable in .github/workflows/ci.yml.
When a new version of BidTabsData is released:
- Update the
BIDTABSDATA_VERSIONvalue in.github/workflows/ci.yml - Commit and push the change
- CI will automatically fetch and cache the new version
The cache is keyed by version, so different versions are cached independently.
You can also set BIDTABSDATA_VERSION as a repository variable in GitHub:
- Go to Settings → Secrets and variables → Actions → Variables
- Add a new repository variable:
BIDTABSDATA_VERSIONwith valuev2025-12-26 - Update the workflow to use
${{ vars.BIDTABSDATA_VERSION || env.BIDTABSDATA_VERSION }}
This allows you to update the version without modifying the workflow file.