Skip to content

refactor(silo): remove remaining parts of partitioning#1226

Merged
Taepper merged 1 commit into
mainfrom
remove-partitioning
Apr 7, 2026
Merged

refactor(silo): remove remaining parts of partitioning#1226
Taepper merged 1 commit into
mainfrom
remove-partitioning

Conversation

@Taepper
Copy link
Copy Markdown
Collaborator

@Taepper Taepper commented Mar 27, 2026

Partitioning in SILO has been "disabled" for some time now, and we do not plan on reintroducing it. Rather approaches such as #1151 will be followed regarding this direction.

This removes the partition code paths from SILO

@github-actions
Copy link
Copy Markdown
Contributor

There is no change in the changelog. This PR will not produce a new releasable version.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fully removes SILO’s remaining partitioning code paths, consolidating storage, serialization, and query execution to operate on a single table-wide column set and a single filter bitmap.

Changes:

  • Removed TablePartition and migrated partition validation/finalization/serialization logic into storage::Table.
  • Replaced per-partition filter computation and scan iteration with a single CopyOnWriteBitmap via computeFilter(...), and updated query/filter compilation accordingly.
  • Renamed column container/types from *Partition* to non-partitioned equivalents (e.g., ColumnGroup, StringColumn, SequenceColumn, etc.) and updated tests + serialized state.

Reviewed changes

Copilot reviewed 119 out of 121 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
testBaseData/siloSerializedState/1774625658/data_version.silo Adds new serialized-state version marker for v22.
testBaseData/siloSerializedState/1774259904/data_version.silo Removes old serialized-state version marker for v21.
src/silo/storage/table_partition.h Removes partition type definition.
src/silo/storage/table_partition.cpp Removes partition validation/finalization implementation.
src/silo/storage/table.h Moves partition-owned fields/serialization/validation APIs into Table.
src/silo/storage/table.cpp Implements single-table initialization, validation, serialization, load/save.
src/silo/storage/column_group.test.cpp Updates tests from ColumnPartitionGroup to ColumnGroup and non-partition column types.
src/silo/storage/column_group.h Renames/retargets container to ColumnGroup with non-partition column types.
src/silo/storage/column_group.cpp Updates column accessors and JSON insertion to non-partition column types.
src/silo/storage/column/zstd_compressed_string_column.test.cpp Updates test naming/types for ZstdCompressedStringColumn.
src/silo/storage/column/zstd_compressed_string_column.h Renames ZstdCompressedStringColumnPartition to ZstdCompressedStringColumn.
src/silo/storage/column/zstd_compressed_string_column.cpp Updates implementations to new class name.
src/silo/storage/column/string_column.test.cpp Updates tests to StringColumn and renames one test case.
src/silo/storage/column/string_column.h Renames StringColumnPartition to StringColumn.
src/silo/storage/column/string_column.cpp Updates implementation to new class name.
src/silo/storage/column/sequence_column.test.cpp Updates tests to SequenceColumn.
src/silo/storage/column/sequence_column.h Renames SequenceColumnPartition to SequenceColumn.
src/silo/storage/column/sequence_column.cpp Updates implementation and explicit template instantiations.
src/silo/storage/column/int_column.test.cpp Updates tests to IntColumn.
src/silo/storage/column/int_column.h Renames IntColumnPartition to IntColumn.
src/silo/storage/column/int_column.cpp Updates implementation to new class name.
src/silo/storage/column/indexed_string_column.test.cpp Updates tests to IndexedStringColumn.
src/silo/storage/column/indexed_string_column.h Renames IndexedStringColumnPartition to IndexedStringColumn.
src/silo/storage/column/indexed_string_column.cpp Updates implementation to new class name.
src/silo/storage/column/float_column.test.cpp Updates tests to FloatColumn.
src/silo/storage/column/float_column.h Renames FloatColumnPartition to FloatColumn.
src/silo/storage/column/float_column.cpp Updates implementation to new class name.
src/silo/storage/column/date32_column.test.cpp Updates tests to Date32Column.
src/silo/storage/column/date32_column.h Renames Date32ColumnPartition to Date32Column.
src/silo/storage/column/date32_column.cpp Updates implementation to new class name.
src/silo/storage/column/column_type_visitor.h Updates visitor dispatch to non-partition column types.
src/silo/storage/column/bool_column.h Renames BoolColumnPartition to BoolColumn.
src/silo/storage/column/bool_column.cpp Updates implementation to new class name.
src/silo/query_engine/operators/zstd_decompress_node.cpp Switches include/calls from partition filters to computeFilter, updates column types.
src/silo/query_engine/operators/table_scan_node.cpp Uses computeFilter and passes bitmap to scan exec node.
src/silo/query_engine/operators/phylo_subtree_node.cpp Uses a single bitmap filter and reads from table.columns instead of partitions.
src/silo/query_engine/operators/mutations_node.cpp Removes partition prefiltering and evaluates mutations against a single table/bitmap.
src/silo/query_engine/operators/most_recent_common_ancestor_node.cpp Uses single bitmap filter and table-wide column access.
src/silo/query_engine/operators/insertions_node.cpp Removes partition prefiltering and evaluates insertions against a single table/bitmap.
src/silo/query_engine/operators/count_filter_node.cpp Counts cardinality of a single bitmap filter instead of summing partitions.
src/silo/query_engine/operators/compute_partition_filters.h Changes API to computeFilter returning a single bitmap (file name unchanged).
src/silo/query_engine/operators/compute_partition_filters.cpp Deletes old per-partition filter computation implementation.
src/silo/query_engine/operators/compute_filter.cpp Adds new single-bitmap filter computation implementation.
src/silo/query_engine/filter/operators/string_in_set.test.cpp Updates operator tests to new column types.
src/silo/query_engine/filter/operators/string_in_set.cpp Updates explicit template instantiations to new column types.
src/silo/query_engine/filter/operators/selection.test.cpp Updates operator tests to use IntColumn.
src/silo/query_engine/filter/operators/selection.h Updates CompareToValueSelection specialization for StringColumn.
src/silo/query_engine/filter/operators/selection.cpp Updates specialization implementation to StringColumn.
src/silo/query_engine/filter/expressions/true.h Removes partition parameter from rewrite/compile APIs.
src/silo/query_engine/filter/expressions/true.cpp Compiles against table-wide sequence_count.
src/silo/query_engine/filter/expressions/symbol_in_set.h Removes partition parameter from rewrite/compile APIs.
src/silo/query_engine/filter/expressions/symbol_in_set.cpp Compiles symbol filters against table.columns and table-wide row count.
src/silo/query_engine/filter/expressions/symbol_equals.h Removes partition parameter from rewrite/compile APIs.
src/silo/query_engine/filter/expressions/symbol_equals.cpp Uses table-wide sequence column metadata/access in rewrite.
src/silo/query_engine/filter/expressions/string_search.h Removes partition parameter from rewrite/compile APIs.
src/silo/query_engine/filter/expressions/string_search.cpp Compiles search against table.columns and table-wide row count.
src/silo/query_engine/filter/expressions/string_in_set.h Removes partition parameter from rewrite/compile APIs.
src/silo/query_engine/filter/expressions/string_in_set.cpp Rewrites/compiles against table.columns and new operator types.
src/silo/query_engine/filter/expressions/string_equals.h Removes partition parameter from rewrite/compile APIs.
src/silo/query_engine/filter/expressions/string_equals.cpp Compiles against table.columns and table-wide row count.
src/silo/query_engine/filter/expressions/phylo_child_filter.h Removes partition dependency and updates string column type.
src/silo/query_engine/filter/expressions/phylo_child_filter.cpp Compiles against table.columns and table-wide row count.
src/silo/query_engine/filter/expressions/or.test.cpp Updates tests to use Table without partitions and new rewrite signature.
src/silo/query_engine/filter/expressions/or.h Removes partition parameter from rewrite/compile APIs.
src/silo/query_engine/filter/expressions/or.cpp Rewrites/compiles children against table-only APIs.
src/silo/query_engine/filter/expressions/nof.h Removes partition parameter from rewrite/compile APIs.
src/silo/query_engine/filter/expressions/nof.cpp Compiles N-of expressions against table-only APIs.
src/silo/query_engine/filter/expressions/negation.h Removes partition parameter from rewrite/compile APIs.
src/silo/query_engine/filter/expressions/negation.cpp Rewrites/compiles negation against table-only APIs.
src/silo/query_engine/filter/expressions/maybe.h Removes partition parameter from rewrite/compile APIs.
src/silo/query_engine/filter/expressions/maybe.cpp Updates rewrite/compile signatures to table-only.
src/silo/query_engine/filter/expressions/lineage_filter.h Removes partition dependency and updates indexed-string column type.
src/silo/query_engine/filter/expressions/lineage_filter.cpp Compiles against table.columns and table-wide row count.
src/silo/query_engine/filter/expressions/is_null.h Removes partition parameter from rewrite/compile APIs.
src/silo/query_engine/filter/expressions/is_null.cpp Compiles null checks against table.columns and table-wide row count.
src/silo/query_engine/filter/expressions/int_equals.h Removes partition parameter from rewrite/compile APIs.
src/silo/query_engine/filter/expressions/int_equals.cpp Compiles int checks against table.columns and table-wide row count.
src/silo/query_engine/filter/expressions/int_between.h Removes partition parameter from rewrite/compile APIs.
src/silo/query_engine/filter/expressions/int_between.cpp Compiles int ranges against table.columns and table-wide row count.
src/silo/query_engine/filter/expressions/insertion_contains.h Removes partition parameter from rewrite/compile APIs.
src/silo/query_engine/filter/expressions/insertion_contains.cpp Compiles insertion checks against table.columns and table-wide row count.
src/silo/query_engine/filter/expressions/has_mutation.h Removes partition parameter from rewrite/compile APIs.
src/silo/query_engine/filter/expressions/has_mutation.cpp Rewrites using table.columns sequence metadata.
src/silo/query_engine/filter/expressions/float_equals.h Removes partition parameter from rewrite/compile APIs.
src/silo/query_engine/filter/expressions/float_equals.cpp Compiles float checks against table.columns and table-wide row count.
src/silo/query_engine/filter/expressions/float_between.h Removes partition parameter from rewrite/compile APIs.
src/silo/query_engine/filter/expressions/float_between.cpp Compiles float ranges against table.columns and table-wide row count.
src/silo/query_engine/filter/expressions/false.h Removes partition parameter from rewrite/compile APIs.
src/silo/query_engine/filter/expressions/false.cpp Compiles empty operator with table-wide row count.
src/silo/query_engine/filter/expressions/expression.h Updates base interface to table-only rewrite/compile.
src/silo/query_engine/filter/expressions/exact.h Removes partition parameter from rewrite/compile APIs.
src/silo/query_engine/filter/expressions/exact.cpp Updates rewrite signature and compile error path.
src/silo/query_engine/filter/expressions/date_equals.h Removes partition parameter from rewrite/compile APIs.
src/silo/query_engine/filter/expressions/date_equals.cpp Compiles date checks against table.columns and table-wide row count.
src/silo/query_engine/filter/expressions/date_between.h Updates date column type and removes partition dependency.
src/silo/query_engine/filter/expressions/date_between.cpp Compiles date ranges against table.columns and table-wide row count.
src/silo/query_engine/filter/expressions/bool_equals.h Removes partition parameter from rewrite/compile APIs.
src/silo/query_engine/filter/expressions/bool_equals.cpp Compiles bool checks against table.columns and table-wide row count.
src/silo/query_engine/filter/expressions/and.h Removes partition parameter from rewrite/compile APIs.
src/silo/query_engine/filter/expressions/and.cpp Compiles AND expressions against table-only APIs.
src/silo/query_engine/exec_node/table_scan.h Updates scan generator to accept a single bitmap filter and scan a single table.
src/silo/query_engine/exec_node/table_scan.cpp Updates scan materialization to use table-wide row ids and non-partition column types.
src/silo/query_engine/exec_node/arrow_util.h Updates Arrow builder mapping to non-partition column types.
src/silo/query_engine/actions/mutations.cpp Removes partition include usage.
src/silo/query_engine/actions/insertions.cpp Removes partition include usage.
src/silo/initialize/initializer.test.cpp Updates metadata type assertions to new column types.
src/silo/initialize/initializer.cpp Updates metadata initializer visitor to new column types.
src/silo/database_info.h Removes number_of_partitions from API struct.
src/silo/database_info.cpp Removes numberOfPartitions JSON output.
src/silo/database.test.cpp Updates expectations accordingly.
src/silo/database.cpp Updates filtering, statistics, and save/load paths for single-table serialization.
src/silo/common/nucleotide_symbols.h Updates Nucleotide::Column alias to new SequenceColumn.
src/silo/common/data_version.h Bumps serialization version to 22.
src/silo/common/aa_symbols.h Updates AminoAcid::Column alias to new SequenceColumn.
src/silo/append/database_inserter.h Removes partition inserter and simplifies to table-only inserter.
src/silo/append/database_inserter.cpp Updates insertion logic and commit path to finalize/validate on table.
src/silo/api/lineage_definition_handler.cpp Updates metadata lookup to new indexed-string column type.
endToEndTests/test/info.test.js Updates /info contract to remove numberOfPartitions.
documentation/api.md Updates /info response example and field list.
Comments suppressed due to low confidence (1)

src/silo/storage/table.cpp:1

  • With partitions removed, row ids are now global across the table. Allowing nuc_column.sequence_count < table.sequence_count (only guarding > here) can cause table scans and filter evaluation to index sequence columns with row_id >= nuc_column.sequence_count, leading to out-of-range access in sequence indexes/buffers. If shorter sequence columns are not a supported state, this should validate nuc_column.sequence_count == sequence_count (and similarly for amino acid sequences and other row-aligned columns) to prevent runtime memory errors. If shorter columns are intended, the scan/filter paths need to treat missing tail rows as nulls consistently.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/silo/append/database_inserter.cpp
Comment thread src/silo/query_engine/exec_node/table_scan.cpp
Copy link
Copy Markdown
Contributor

@fengelniederhammer fengelniederhammer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, only minor stuff.

also the data_directories.md says:

    default/
      ...  (partition data)

Which might be worth updating.

Comment thread src/silo/storage/column/sequence_column.cpp Outdated
Comment thread src/silo/query_engine/exec_node/table_scan.cpp Outdated
Comment thread src/silo/storage/column_group.test.cpp Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 119 out of 121 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (2)

src/silo/storage/table.cpp:1

  • loadData() no longer checks whether the returned input stream is actually open/valid before deserializing. In this file, saveData() still performs an explicit stream validity check, so loadData() should do the same and throw a load-specific exception (e.g., persistence::LoadDatabaseException) with a clear message when the file cannot be opened/read.
    src/silo/query_engine/exec_node/table_scan.cpp:176
  • The column lookup table.columns.getColumns<Column>().at(column_name) is performed for every row_id. This has avoidable overhead (map lookup + bounds check) on hot scan paths. Hoist the column reference outside the loop (and similarly hoist any other repeated per-row lookups) so the loop only does null/value access.
arrow::Status ColumnEntryAppender::operator()(
   ExecBatchBuilder& table_scan_node,
   const std::string& column_name,
   const storage::Table& table,
   const roaring::Roaring& row_ids
) {
   EVOBENCH_SCOPE("ColumnEntryAppender", columnTypeToString(Column::TYPE));
   auto array = table_scan_node.getColumnTypeArrayBuilders<Column>().at(column_name);
   for (auto row_id : row_ids) {
      auto& column = table.columns.getColumns<Column>().at(column_name);
      if (column.isNull(row_id)) {
         ARROW_RETURN_NOT_OK(array->AppendNull());
      } else {
         if constexpr (std::is_same_v<Column, storage::column::StringColumn>) {
            auto value = column.getValueString(row_id);
            ARROW_RETURN_NOT_OK(array->Append(value));
         } else if constexpr (std::is_same_v<Column, storage::column::IndexedStringColumn>) {
            auto value = column.getValueString(row_id);
            ARROW_RETURN_NOT_OK(array->Append(value));
         } else if constexpr (std::is_same_v<Column, storage::column::Date32Column>) {
            auto value = common::date32ToString(column.getValue(row_id));
            ARROW_RETURN_NOT_OK(array->Append(value));
         } else {
            auto value = column.getValue(row_id);
            ARROW_RETURN_NOT_OK(array->Append(value));
         }
      }
   }
   return arrow::Status::OK();
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Taepper Taepper force-pushed the remove-partitioning branch from 0528ec6 to fd92530 Compare April 7, 2026 11:14
@Taepper Taepper merged commit 2424d7d into main Apr 7, 2026
18 checks passed
@Taepper Taepper deleted the remove-partitioning branch April 7, 2026 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants