Skip to content

feat(silo): add option to ignore unknown lineage values#1234

Merged
Taepper merged 2 commits into
mainfrom
1233-allow-unknown-lineages
Apr 17, 2026
Merged

feat(silo): add option to ignore unknown lineage values#1234
Taepper merged 2 commits into
mainfrom
1233-allow-unknown-lineages

Conversation

@Taepper
Copy link
Copy Markdown
Collaborator

@Taepper Taepper commented Apr 16, 2026

resolves #1233

Summary

Also see discussion on Loculus slack. This adds an option to disable the strict validation of lineage values

PR Checklist

  • All necessary documentation has been adapted or there is an issue to do so.
  • The implemented feature is covered by an appropriate test.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 16, 2026

This is a preview of the changelog of the next release. If this branch is not up-to-date with the current main branch, the changelog may not be accurate. Rebase your branch on the main branch to get the most accurate changelog.

Note that this might contain changes that are on main, but not yet released.

Changelog:

0.11.2 (2026-04-17)

Features

  • silo: add option to ignore unknown lineage values (7fc3ba9)

@Taepper Taepper force-pushed the 1233-allow-unknown-lineages branch from 9275617 to a708c23 Compare April 16, 2026 09:38
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a configurable “lenient” mode for lineage-indexed string columns so that unknown lineage values can be accepted (treated as null w.r.t. the lineage hierarchy) instead of failing insertion, and updates serialization + docs accordingly.

Changes:

  • Introduces treatUnknownLineagesAsNull in database config and threads it through column metadata/initialization.
  • Updates lineage-index insertion logic to optionally ignore unknown lineage values (instead of returning an error).
  • Bumps serialization version and refreshes the committed serialized-state test fixture; adds/updates unit tests and documentation.

Reviewed changes

Copilot reviewed 11 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
testBaseData/siloSerializedState/1776329347/default.silo New serialized DB fixture reflecting the updated serialization format.
testBaseData/siloSerializedState/1776329347/data_version.silo Records timestamp + new serialization version for the new fixture directory.
testBaseData/siloSerializedState/1774967818/data_version.silo Removes old fixture metadata for the previous serialized-state snapshot.
src/silo/storage/column/indexed_string_column.test.cpp Updates existing tests for new ctor signature; adds test for lenient insertion mode.
src/silo/storage/column/indexed_string_column.h Adds metadata flag and includes it in boost serialization save/load.
src/silo/storage/column/indexed_string_column.cpp Implements optional “ignore unknown lineage” behavior during lineage-index insertion.
src/silo/initialize/initializer.cpp Wires config flag into IndexedStringColumn metadata construction.
src/silo/config/database_config.h Adds config field treat_unknown_lineages_as_null.
src/silo/config/database_config.cpp Parses/encodes new YAML option treatUnknownLineagesAsNull.
src/silo/common/serialization_version.txt Bumps global serialization version.
documentation/input_format.md Documents treatUnknownLineagesAsNull in schema metadata options.
Makefile Removes .PHONY declaration for conanprofile target.
Comments suppressed due to low confidence (2)

src/silo/storage/column/indexed_string_column.h:40

  • IndexedStringColumnMetadata::treat_unknown_lineages_as_null is never initialized in the constructors that don't take this flag (e.g. IndexedStringColumnMetadata(std::string) and (std::string, BidirectionalStringMap)). This results in undefined/indeterminate values and non-deterministic serialization because save() always archives this field. Initialize it to a known default (e.g. default member initializer = false and/or add it to the member initializer list in those constructors).
class IndexedStringColumnMetadata : public ColumnMetadata {
  public:
   common::BidirectionalStringMap dictionary;
   std::optional<common::LineageTreeAndIdMap> lineage_tree;
   bool treat_unknown_lineages_as_null;

   explicit IndexedStringColumnMetadata(std::string column_name)
       : ColumnMetadata(std::move(column_name)) {}

   IndexedStringColumnMetadata(
      std::string column_name,
      silo::common::BidirectionalStringMap dictionary
   )
       : ColumnMetadata(std::move(column_name)),
         dictionary(std::move(dictionary)) {}

src/silo/storage/column/indexed_string_column.h:169

  • In the std::shared_ptr<IndexedStringColumnMetadata> deserializer, treat_unknown_lineages_as_null is read from the archive but is not applied when lineage_tree is nullopt (the else-branch constructs metadata without setting the flag). This can reintroduce an indeterminate value unless the constructor/default member initializer sets it, and it also makes serialization round-trips asymmetric. Consider passing the flag into the constructor in the else-branch or setting the member on the constructed object.
   bool treat_unknown_lineages_as_null;
   archive & column_name;
   archive & dictionary;
   archive & lineage_tree;
   archive & treat_unknown_lineages_as_null;
   if (lineage_tree.has_value()) {
      object = std::make_shared<silo::storage::column::IndexedStringColumnMetadata>(
         std::move(column_name),
         std::move(dictionary),
         std::move(lineage_tree.value()),
         treat_unknown_lineages_as_null
      );
   } else {
      object = std::make_shared<silo::storage::column::IndexedStringColumnMetadata>(
         std::move(column_name), std::move(dictionary)
      );
   }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread Makefile
Comment thread src/silo/config/database_config.cpp
Comment thread src/silo/storage/column/indexed_string_column.test.cpp
@Taepper Taepper force-pushed the 1233-allow-unknown-lineages branch from a708c23 to 7fc3ba9 Compare April 17, 2026 12:35
@Taepper Taepper merged commit 16e1389 into main Apr 17, 2026
18 checks passed
@Taepper Taepper deleted the 1233-allow-unknown-lineages branch April 17, 2026 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow unknown lineage values

3 participants