Summary
Both dss_install() and eco_install() currently build their DuckDB databases from scratch (DSSTox from CompTox API, ECOTOX from EPA FTP ASCII dump). This works but is slow and requires API keys / large downloads.
GitHub Release assets support files up to 2 GB. Both databases fit comfortably:
- DSSTox: ~15M rows, estimated moderate size
- ECOTOX: ~424 MB
Proposal
Investigate hosting pre-built .duckdb files as GitHub Release assets so dss_install() / eco_install() can offer a fast-path download instead of building from source every time.
Questions to resolve
- What's the actual built size of
dsstox.duckdb?
- Can we version the databases by EPA release date (e.g.,
ecotox-2026-03.duckdb)?
- Should
*_install() default to download (fast) with a build = TRUE flag for from-source, or the other way around?
- Any licensing concerns with redistributing EPA-derived databases as release assets?
- CI workflow to automate building and attaching assets on release?
References
DSSTOX_MERGE_PLAN.md — Section 3 (Database Build Pipeline), Option C
ECOTOX_MERGE_PLAN.md — Section 3 (Database Build Pipeline), Option B
Summary
Both
dss_install()andeco_install()currently build their DuckDB databases from scratch (DSSTox from CompTox API, ECOTOX from EPA FTP ASCII dump). This works but is slow and requires API keys / large downloads.GitHub Release assets support files up to 2 GB. Both databases fit comfortably:
Proposal
Investigate hosting pre-built
.duckdbfiles as GitHub Release assets sodss_install()/eco_install()can offer a fast-path download instead of building from source every time.Questions to resolve
dsstox.duckdb?ecotox-2026-03.duckdb)?*_install()default to download (fast) with abuild = TRUEflag for from-source, or the other way around?References
DSSTOX_MERGE_PLAN.md— Section 3 (Database Build Pipeline), Option CECOTOX_MERGE_PLAN.md— Section 3 (Database Build Pipeline), Option B