Skip to content

[DevOps]: Scope automated metadata ingestion at other sites (prioritize Chrysalis first) #154

@tomvothecoder

Description

@tomvothecoder

Objective

Extend SimBoard metadata ingestion beyond Perlmutter/NERSC, starting with Chrysalis, using a shared scheduler-agnostic Python entrypoint plus thin site-specific wrappers.

Current branch: feature/154-ingestion-sites

The branch currently adds:

  • backend/app/scripts/ingestion/hpc_archive_ingestor.py: shared entrypoint that delegates to the existing NERSC archive ingestor for now.
  • backend/app/scripts/ingestion/sites/chrysalis.sh: thin Chrysalis Jenkins wrapper.
  • README updates documenting the shared ingestor and Chrysalis defaults.
  • A test that verifies the generic entrypoint delegates through the existing ingestor path.

Task

Finish turning this branch into a deployable ingestion path for Chrysalis first, then use the same pattern for other sites once access is available.

  • Review the current branch implementation and keep ingestion logic in Python, not shell wrappers.
  • Validate the Chrysalis archive path and Jenkins runtime assumptions.
  • Confirm how SIMBOARD_API_BASE_URL and SIMBOARD_API_TOKEN should be stored and injected in the Chrysalis Jenkins job.
  • Run the Chrysalis wrapper in dry-run mode and verify archive access, network egress to SimBoard, and candidate counts.
  • Enable non-dry-run ingestion only after the dry-run output is validated.
  • Apply for or confirm accounts/access for Frontier, Aurora, and Compy.
  • After Chrysalis works, add equivalent thin wrappers for the remaining sites as access allows.

Site Priorities

  • Chrysalis: priority once available again; Jenkins-based workflow.
  • Frontier: apply for account; likely next priority.
  • Aurora: apply for account; likely next priority.
  • Compy: apply for account.
  • Anvil: removed from scope.

References

Metadata

Metadata

Assignees

Labels

help wantedExtra attention is neededtype: devopsDevOps task (e.g., CI/CD, Docker)

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions