Index GitHub repositories from OSO artifacts_v1 and collect direct dependencies (npm, Python/PyPI, Rust crates), git submodules, and Foundry libs into Parquet snapshots and an events stream.
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# Configure credentials (use .env locally if you prefer)
export OSO_API_KEY="<your_api_key>"
# optional for higher clone rate limits
export GITHUB_TOKEN="<your_gh_pat>"
# Small smoke test run (default limit applies if no --only) to test setup end to end without assuming any specfic owner
./.venv/bin/python scripts/sbom_fetcher.py --output-dir data/sbom --incremental
# Focused owner run (no default limit when --only is provided) to test the actual OSO scoped command
./.venv/bin/python scripts/sbom_fetcher.py --output-dir data/sbom --only opensource-observer/ --limit 0
# Source the deactivate script:
source .venv/bin/deactivate- Add repository secrets:
OSO_API_KEYand (optionally)GITHUB_TOKEN. - Run the "SBOM fetcher" workflow manually or wait for the daily schedule.
Outputs are written under data/sbom/ and committed back to the repo.
Notes:
- Provide
OSO_API_KEYand optionallyGITHUB_TOKENas repository Secrets. .envfiles are for local development only; Actions will not read them.
- npm (
package.json) — dependencies, devDependencies, peerDependencies - Python/PyPI (
pyproject.toml,requirements*.txt) — PEP 621 and Poetry - Rust/Cargo (
Cargo.toml) — dependencies, dev-dependencies, build-dependencies - Git submodules (
.gitmodules) - Foundry (
foundry.toml,lib/*)
data/sbom/snapshots/{artifact_namespace__artifact_name}/{YYYY-MM-DD}/*.parquetdata/sbom/events/{YYYY-MM-DD}/*.parquetdata/sbom/state.jsondata/sbom/run_summaries/*.parquet
-
Snapshots (subset of columns):
artifact_id,artifact_source,artifact_namespace,artifact_name,artifact_url,repo_head_shapackage_manager,dependency_name,dependency_version_requirement,dependency_scope,manifest_path,directtime_collected
-
Events columns:
artifact_namespace,artifact_name,package_manager,dependency_name,change_type(added/removed/updated),previous_version,current_version,event_time_collected
-
Flags:
--only <owner>: filter by owner/namespace (trailing slash optional)--limit <N>: cap number of repos (use0for all; recommended with--only)--incremental: skip repos whose HEAD SHA is unchanged--max-workers <N>: concurrent clones/parsers (default viaSBOM_MAX_WORKERS)
-
Environment variables:
SBOM_MAX_WORKERS(default 8)SBOM_GIT_CLONE_RETRIES(default 2)SBOM_GIT_CLONE_TIMEOUT(seconds, default 120)SBOM_GIT_CMD_TIMEOUT(seconds, default 60)