Skip to content

Modifications to bootstrapping to reduce memory usage#29

Open
ECharria wants to merge 1 commit into
mainfrom
bootstrapping-memory-save
Open

Modifications to bootstrapping to reduce memory usage#29
ECharria wants to merge 1 commit into
mainfrom
bootstrapping-memory-save

Conversation

@ECharria
Copy link
Copy Markdown
Owner

Summary
Reduces memory usage of bootstrapping.py by using smaller integer dtypes for count and indicator matrices. Outputs are bit-identical to the previous implementation, counts and 0/1 indicators are exact integers, only their representation changes. Motivated by an OOM crash on a 100,517-spectrum run: the original code allocated several (N, N) float64 matrices (~75 GiB each) to hold integer counts.

Changes (in specreboot/bootstrapping/bootstrapping.py)

  1. total_pair_counts, total_edge_support: float64 → uint16 (max value = B)
  2. pair_counts, edge_support (per-iteration): int64 → uint8 (binary indicators)
  3. mutual_topk result: float64 → uint8 (function only writes 0 or 1)
  4. Same changes mirrored in _reconstruct_history
  5. total_pair_similarities intentionally left as float64 to preserve precision of the similarity sum

Impact
At N = 100,517, each (N, N) matrix drops from 75.3 GiB (float64) to 18.8 GiB (uint16) or 9.4 GiB (uint8). The dataset that previously OOM'd at ~189 GiB RSS now runs to completion at ~111 GiB.
Assumes B ≤ 65,535 (uint16 capacity); comfortable for realistic SpecReBoot runs.

Verification

  1. Equivalence test (B=20, N=200): mean_similarity and mean_edge_support bit-identical (max abs diff = 0.00e+00)
  2. 100,517-spectrum GNPS bile acid dataset: completed successfully
  3. 112,086-spectrum GNPS bile acid dataset: completed successfully

@ECharria ECharria requested a review from rtlortega May 12, 2026 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant