Skip to content

Investigate GitHub Release assets for pre-built DSSTox and ECOTOX databases #135

@seanthimons

Description

@seanthimons

Summary

Both dss_install() and eco_install() currently build their DuckDB databases from scratch (DSSTox from CompTox API, ECOTOX from EPA FTP ASCII dump). This works but is slow and requires API keys / large downloads.

GitHub Release assets support files up to 2 GB. Both databases fit comfortably:

  • DSSTox: ~15M rows, estimated moderate size
  • ECOTOX: ~424 MB

Proposal

Investigate hosting pre-built .duckdb files as GitHub Release assets so dss_install() / eco_install() can offer a fast-path download instead of building from source every time.

Questions to resolve

  • What's the actual built size of dsstox.duckdb?
  • Can we version the databases by EPA release date (e.g., ecotox-2026-03.duckdb)?
  • Should *_install() default to download (fast) with a build = TRUE flag for from-source, or the other way around?
  • Any licensing concerns with redistributing EPA-derived databases as release assets?
  • CI workflow to automate building and attaching assets on release?

References

  • DSSTOX_MERGE_PLAN.md — Section 3 (Database Build Pipeline), Option C
  • ECOTOX_MERGE_PLAN.md — Section 3 (Database Build Pipeline), Option B

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions