Smarter unexpected sequence report #32

mtomko · 2024-02-12T22:22:34Z

Rather than sample unexpected sequences by % of each column barcode's
unexpected sequences, we set a maximum number of row barcodes for which
we are willing to track unexpected sequences. When reading the
unexpected sequence cache, we take 1 barcode from each shard in turn,
until we have read all of them or established a sample of that size. All
further unexpected sequences that are not found in that set are not
tracked.

This lets us bound the memory that PoolQ uses for its unexpected
sequence reporting, without needing to individually track the sizes of
individual shards. It should also perform better than the previous
strategy since it reads the cache only once.

Rather than sample unexpected sequences by % of each column barcode's unexpected sequences, we set a maximum number of row barcodes for which we are willing to track unexpected sequences. When reading the unexpected sequence cache, we take 1 barcode from each shard in turn, until we have read all of them or established a sample of that size. All further unexpected sequences that are not found in that set are not tracked. This lets us bound the memory that PoolQ uses for its unexpected sequence reporting, without needing to individually track the sizes of individual shards. It should also perform better than the previous strategy since it reads the cache only once.

src/main/scala/org/broadinstitute/gpp/poolq3/reports/UnexpectedSequenceWriter.scala

Makefile

tmgreen

⛱️ 🧃 🦗

mtomko added 4 commits February 12, 2024 14:21

Do not track shard sizes

add7bc6

Set version to 3.10.0-SNAPSHOT

75123f6

Update changelog, manual, and readme

bb1a816

mtomko self-assigned this Feb 12, 2024

Fix makefile

f1a798c

tmgreen reviewed Feb 13, 2024

View reviewed changes

src/main/scala/org/broadinstitute/gpp/poolq3/reports/UnexpectedSequenceWriter.scala Show resolved Hide resolved

tmgreen reviewed Feb 13, 2024

View reviewed changes

src/main/scala/org/broadinstitute/gpp/poolq3/reports/UnexpectedSequenceWriter.scala Outdated Show resolved Hide resolved

tmgreen reviewed Feb 13, 2024

View reviewed changes

Makefile Show resolved Hide resolved

mtomko added 2 commits February 13, 2024 09:25

Use .size

538cd30

Simpler breadth-first iterator

f7eca37

tmgreen approved these changes Feb 13, 2024

View reviewed changes

mtomko merged commit 31a4bed into main Feb 13, 2024
1 check passed

mtomko deleted the smarter-unexpected-sequence-report branch February 13, 2024 20:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smarter unexpected sequence report #32

Smarter unexpected sequence report #32

mtomko commented Feb 12, 2024 •

edited

Loading

tmgreen left a comment

Smarter unexpected sequence report #32

Smarter unexpected sequence report #32

Conversation

mtomko commented Feb 12, 2024 • edited Loading

tmgreen left a comment

Choose a reason for hiding this comment

mtomko commented Feb 12, 2024 •

edited

Loading