Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smarter unexpected sequence report #32

Merged
merged 7 commits into from
Feb 13, 2024
Merged

Conversation

mtomko
Copy link
Collaborator

@mtomko mtomko commented Feb 12, 2024

Rather than sample unexpected sequences by % of each column barcode's
unexpected sequences, we set a maximum number of row barcodes for which
we are willing to track unexpected sequences. When reading the
unexpected sequence cache, we take 1 barcode from each shard in turn,
until we have read all of them or established a sample of that size. All
further unexpected sequences that are not found in that set are not
tracked.

This lets us bound the memory that PoolQ uses for its unexpected
sequence reporting, without needing to individually track the sizes of
individual shards. It should also perform better than the previous
strategy since it reads the cache only once.

Rather than sample unexpected sequences by % of each column barcode's
unexpected sequences, we set a maximum number of row barcodes for which
we are willing to track unexpected sequences. When reading the
unexpected sequence cache, we take 1 barcode from each shard in turn,
until we have read all of them or established a sample of that size. All
further unexpected sequences that are not found in that set are not
tracked.

This lets us bound the memory that PoolQ uses for its unexpected
sequence reporting, without needing to individually track the sizes of
individual shards. It should also perform better than the previous
strategy since it reads the cache only once.
@mtomko mtomko self-assigned this Feb 12, 2024
Copy link
Member

@tmgreen tmgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛱️ 🧃 🦗

@mtomko mtomko merged commit 31a4bed into main Feb 13, 2024
1 check passed
@mtomko mtomko deleted the smarter-unexpected-sequence-report branch February 13, 2024 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants