Skip to content

feat(silo): add a MutationProfile filter#1189

Merged
Taepper merged 2 commits into
mainfrom
1179-mutation-profile
Apr 30, 2026
Merged

feat(silo): add a MutationProfile filter#1189
Taepper merged 2 commits into
mainfrom
1179-mutation-profile

Conversation

@Taepper
Copy link
Copy Markdown
Collaborator

@Taepper Taepper commented Mar 2, 2026

resolves #1179

Summary

This adds a MutationProfile filter to silo. The behavior of this filter is outlined in #1179

PR Checklist

  • All necessary documentation has been adapted or there is an issue to do so.
  • The implemented feature is covered by an appropriate test.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 2, 2026

This is a preview of the changelog of the next release. If this branch is not up-to-date with the current main branch, the changelog may not be accurate. Rebase your branch on the main branch to get the most accurate changelog.

Note that this might contain changes that are on main, but not yet released.

Changelog:

0.11.2 (2026-04-29)

Features

  • silo: add a MutationProfile filter (3838964)

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new MutationProfile filter expression to SILO (per #1179) and introduces an optimized compilation path for large N-Of expressions over sequence positions to avoid repeated vertical-index lookups. This enables efficient “distance to profile” queries that expand into many per-position symbol conditions.

Changes:

  • Introduces NucleotideMutationProfile / AminoAcidMutationProfile expression that rewrites into an N-Of/Not form.
  • Adds a single-pass vertical-index DP helper (VerticalSequenceIndex::buildNOfDpTable) and a new NOf compile fast-path for SymbolInSet children on the same sequence.
  • Adds docs, integration tests, and performance benchmarks/utilities for profiling the new behavior.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/silo/query_engine/filter/expressions/mutation_profile.h Defines the new MutationProfile expression template and JSON parsing hook.
src/silo/query_engine/filter/expressions/mutation_profile.cpp Implements profile construction (querySequence / sequenceId / mutations) and rewrite to Not(N-Of(...)).
src/silo/query_engine/filter/expressions/expression.cpp Registers new expression types: NucleotideMutationProfile and AminoAcidMutationProfile.
src/silo/query_engine/filter/expressions/symbol_in_set.h Adds getters needed for the new NOf compilation optimization.
src/silo/query_engine/filter/expressions/nof.cpp Adds optimized compile path that batches vertical-index access and inlines the threshold DP.
src/silo/storage/column/vertical_sequence_index.h Declares PositionQuery and buildNOfDpTable DP helper.
src/silo/storage/column/vertical_sequence_index.cpp Implements buildNOfDpTable with a forward scan over vertical_bitmaps.
src/silo/test/mutation_profile.test.cpp Adds integration tests for NucleotideMutationProfile behavior and error cases.
documentation/query_documentation.md Documents NucleotideMutationProfile and AminoAcidMutationProfile JSON formats and semantics.
performance/sequence_generator.h Adds shared benchmark utilities for generating synthetic sequences/reads and initializing DBs.
performance/nof_sequence_filter.cpp Adds a benchmark targeting the large-N-Of optimization via MutationProfile.
performance/many_short_read_filters.cpp Refactors to reuse sequence_generator.h.
performance/CMakeLists.txt Ensures benchmarks can include performance headers and adds the new benchmark target.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/silo/query_engine/filter/expressions/nof.cpp Outdated
Comment thread src/silo/query_engine/filter/expressions/nof.cpp Outdated
Comment thread src/silo/storage/column/vertical_sequence_index.cpp Outdated
Comment thread src/silo/test/mutation_profile.test.cpp
Comment thread performance/sequence_generator.h
@Taepper Taepper force-pushed the 1179-mutation-profile branch from 3c324c7 to d15c000 Compare March 3, 2026 09:30
@Taepper Taepper force-pushed the 1179-mutation-profile branch from d15c000 to 9832a96 Compare March 11, 2026 14:08
Comment thread documentation/query_documentation.md Outdated
Comment thread documentation/query_documentation.md Outdated
Comment thread src/silo/query_engine/filter/expressions/mutation_profile.cpp Outdated
Comment thread src/silo/query_engine/filter/expressions/mutation_profile.cpp
Comment thread src/silo/test/mutation_profile.test.cpp
@Taepper Taepper force-pushed the 1179-mutation-profile branch 2 times, most recently from 71b3438 to bcf53b1 Compare April 15, 2026 13:57
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/silo/common/string_utils.h
Comment thread src/silo/query_engine/filter/expressions/mutation_profile.cpp
Comment thread performance/sequence_generator.h Outdated
Comment thread src/silo/test/mutation_profile.test.cpp Outdated
Comment thread src/silo/query_engine/filter/expressions/or.cpp Outdated
Comment thread src/silo/query_engine/filter/expressions/and.cpp Outdated
Comment thread src/silo/query_engine/filter/operators/threshold.cpp
Comment thread src/silo/test/mutation_profile.test.cpp Outdated
Comment thread src/silo/query_engine/filter/operators/intersection.cpp
Comment thread documentation/query_documentation.md Outdated
Comment thread src/silo/common/string_utils.h
Comment thread src/silo/query_engine/filter/expressions/nof.cpp
Comment thread src/silo/query_engine/filter/expressions/mutation_profile.cpp
@Taepper Taepper force-pushed the 1179-mutation-profile branch from 11f160d to 1c7cd48 Compare April 29, 2026 15:40
@Taepper Taepper merged commit 644cb2e into main Apr 30, 2026
47 of 50 checks passed
@Taepper Taepper deleted the 1179-mutation-profile branch April 30, 2026 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add MutationProfile filter

3 participants