feat(silo): add option to ignore unknown lineage values#1234
Conversation
|
This is a preview of the changelog of the next release. If this branch is not up-to-date with the current main branch, the changelog may not be accurate. Rebase your branch on the main branch to get the most accurate changelog. Note that this might contain changes that are on main, but not yet released. Changelog: 0.11.2 (2026-04-17)Features
|
9275617 to
a708c23
Compare
There was a problem hiding this comment.
Pull request overview
Adds a configurable “lenient” mode for lineage-indexed string columns so that unknown lineage values can be accepted (treated as null w.r.t. the lineage hierarchy) instead of failing insertion, and updates serialization + docs accordingly.
Changes:
- Introduces
treatUnknownLineagesAsNullin database config and threads it through column metadata/initialization. - Updates lineage-index insertion logic to optionally ignore unknown lineage values (instead of returning an error).
- Bumps serialization version and refreshes the committed serialized-state test fixture; adds/updates unit tests and documentation.
Reviewed changes
Copilot reviewed 11 out of 13 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| testBaseData/siloSerializedState/1776329347/default.silo | New serialized DB fixture reflecting the updated serialization format. |
| testBaseData/siloSerializedState/1776329347/data_version.silo | Records timestamp + new serialization version for the new fixture directory. |
| testBaseData/siloSerializedState/1774967818/data_version.silo | Removes old fixture metadata for the previous serialized-state snapshot. |
| src/silo/storage/column/indexed_string_column.test.cpp | Updates existing tests for new ctor signature; adds test for lenient insertion mode. |
| src/silo/storage/column/indexed_string_column.h | Adds metadata flag and includes it in boost serialization save/load. |
| src/silo/storage/column/indexed_string_column.cpp | Implements optional “ignore unknown lineage” behavior during lineage-index insertion. |
| src/silo/initialize/initializer.cpp | Wires config flag into IndexedStringColumn metadata construction. |
| src/silo/config/database_config.h | Adds config field treat_unknown_lineages_as_null. |
| src/silo/config/database_config.cpp | Parses/encodes new YAML option treatUnknownLineagesAsNull. |
| src/silo/common/serialization_version.txt | Bumps global serialization version. |
| documentation/input_format.md | Documents treatUnknownLineagesAsNull in schema metadata options. |
| Makefile | Removes .PHONY declaration for conanprofile target. |
Comments suppressed due to low confidence (2)
src/silo/storage/column/indexed_string_column.h:40
IndexedStringColumnMetadata::treat_unknown_lineages_as_nullis never initialized in the constructors that don't take this flag (e.g.IndexedStringColumnMetadata(std::string)and(std::string, BidirectionalStringMap)). This results in undefined/indeterminate values and non-deterministic serialization becausesave()always archives this field. Initialize it to a known default (e.g. default member initializer= falseand/or add it to the member initializer list in those constructors).
class IndexedStringColumnMetadata : public ColumnMetadata {
public:
common::BidirectionalStringMap dictionary;
std::optional<common::LineageTreeAndIdMap> lineage_tree;
bool treat_unknown_lineages_as_null;
explicit IndexedStringColumnMetadata(std::string column_name)
: ColumnMetadata(std::move(column_name)) {}
IndexedStringColumnMetadata(
std::string column_name,
silo::common::BidirectionalStringMap dictionary
)
: ColumnMetadata(std::move(column_name)),
dictionary(std::move(dictionary)) {}
src/silo/storage/column/indexed_string_column.h:169
- In the
std::shared_ptr<IndexedStringColumnMetadata>deserializer,treat_unknown_lineages_as_nullis read from the archive but is not applied whenlineage_treeisnullopt(the else-branch constructs metadata without setting the flag). This can reintroduce an indeterminate value unless the constructor/default member initializer sets it, and it also makes serialization round-trips asymmetric. Consider passing the flag into the constructor in the else-branch or setting the member on the constructed object.
bool treat_unknown_lineages_as_null;
archive & column_name;
archive & dictionary;
archive & lineage_tree;
archive & treat_unknown_lineages_as_null;
if (lineage_tree.has_value()) {
object = std::make_shared<silo::storage::column::IndexedStringColumnMetadata>(
std::move(column_name),
std::move(dictionary),
std::move(lineage_tree.value()),
treat_unknown_lineages_as_null
);
} else {
object = std::make_shared<silo::storage::column::IndexedStringColumnMetadata>(
std::move(column_name), std::move(dictionary)
);
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
a708c23 to
7fc3ba9
Compare
resolves #1233
Summary
Also see discussion on Loculus slack. This adds an option to disable the strict validation of lineage values
PR Checklist