-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[skip-ci][NFC][ntuple] add schema evolution docs #20079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,210 @@ | ||
# Schema Evolution | ||
|
||
Schema evolution is the capability of the ROOT I/O to read data | ||
into in-memory models that are different but compatible to the on-disk schema. | ||
|
||
Schema evolution allows for data models to evolve over time | ||
such that old data can be read into current models ("backward compatibility") | ||
and old software can read newer data models ("forward compatibility"). | ||
For instance, data model authors may over time add and reorder class members, change data types | ||
(e.g. `std::vector<float>` --> `ROOT::RVec<double>`), rename classes, etc. | ||
|
||
ROOT applies automatic schema evolution rules for common, safe and unambiguous cases. | ||
Users can complement the automatic rules by manual schema evolution ("I/O customization rules") | ||
where custom code snippets implement the transformation logic. | ||
In case neither automatic nor any of the provided I/O customization rules suffice | ||
to transform the on-disk schema into the in-memory model, ROOT will error out and refrain from reading data. | ||
|
||
This document describes schema evolution support implemented in RNTuple. | ||
For the most part, schema evolution works identical across the different ROOT I/O systems (TFile, TTree, RNTuple). | ||
The exceptions are listed in the last section of this document. | ||
|
||
## Automatic schema evolution | ||
|
||
ROOT applies a number of rules to read data transparently into in-memory models | ||
that are not an exact match to the on-disk schema. | ||
The automatic rules apply recursively to compound types (classes, tuples, collections, etc.); | ||
the outer types are evolved before the inner types. | ||
|
||
Automatic schema evolution rules transform native _types_ as well as the _shape_ of user-defined classes | ||
as listed in the following, exhaustive tables. | ||
|
||
### Class shape transformations | ||
|
||
User-defined classes can automatically evolve their layout in the following ways. | ||
Note that users should increase the class version number when the layout changes. | ||
However, for RNTuple automatic rules that is not mandatory; | ||
RNTuple will always compare the current on-disk layout with the in-memory model. | ||
|
||
| Layout Change | Also supported in Untyped Records | Comment | | ||
| --------------------------------------- | --------------------------------- | -------------------- | | ||
| Remove member | Yes | Match by member name | | ||
| Add member | Yes | Match by member name | | ||
| Reorder members | Yes | Match by member name | | ||
| Remove all base classes | n/a | | | ||
| Add base class(es) where they were none | n/a | | | ||
|
||
Reordering and incremental addition or removal of base classes is currently unsupported | ||
but may be supported in future RNTuple versions. | ||
|
||
### Type transformations | ||
|
||
ROOT transparently reads into in-memory types that are different from but compatible to the on-disk type. | ||
In the following tables, `T'` denotes a type that is compatible to `T`. | ||
|
||
#### Plain fields | ||
|
||
| In-memory type | Compatible on-disk types | Comment | | ||
| --------------------------- | --------------------------- | ---------------------| | ||
| `bool` | `char` | | | ||
| | `std::[u]int[8,16,32,64]_t` | | | ||
| | enum | | | ||
|-----------------------------|-----------------------------|----------------------| | ||
| `char` | `bool` | | | ||
| | `std::[u]int[8,16,32,64]_t` | with bounds check | | ||
| | enum | with bounds check | | ||
|-----------------------------|-----------------------------|----------------------| | ||
| `std::[u]int[8,16,32,64]_t` | `bool` | | | ||
| | `char` | | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably we should specify the behavior when the value doesn't fit in the |
||
| | `std::[u]int[8,16,32,64]_t` | with bounds check | | ||
| | enum | with bounds check | | ||
|-----------------------------|-----------------------------|----------------------| | ||
| enum | enum of different type | with bounds check | | ||
|-----------------------------|-----------------------------|----------------------| | ||
| float | double | | | ||
|-----------------------------|-----------------------------|----------------------| | ||
| double | float | | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No bounds check? |
||
|-----------------------------|-----------------------------|----------------------| | ||
| `std::atomic<T>` | `T'` | | | ||
|
||
|
||
#### Variable-length collections | ||
|
||
| In-memory type | Compatible on-disk types | Comment | | ||
| -------------------------------- | ------------------------------------ | ------------------------------------- | | ||
| `std::vector<T>` | `ROOT::RVec<T'>` | | | ||
| | `std::array<T', N>` | | | ||
| | `std::[unordered_][multi]set<T'>` | | | ||
| | `std::[unordered_][multi]map<K',V'>` | only `T` = `std::[pair,tuple]<K,V>` | | ||
| | `std::optional<T'>` | | | ||
| | `std::unique_ptr<T'>` | | | ||
| | User-defined collection of `T'` | | | ||
| | Untyped collection of `T'` | | | ||
|----------------------------------|--------------------------------------|---------------------------------------| | ||
| `std::RVec<T>` | `ROOT::vector<T'>` | with size check | | ||
| | `std::array<T', N>` | with size check | | ||
| | `std::[unordered_][multi]set<T'>` | with size check | | ||
| | `std::[unordered_][multi]map<K',V'>` | only `T` = `std::[pair,tuple]<K,V>`, | | ||
| | | with size check | | ||
| | `std::optional<T'>` | | | ||
| | `std::unique_ptr<T'>` | | | ||
| | User-defined collection of `T'` | with size check | | ||
| | Untyped collectionof `T'` | with size check | | ||
|----------------------------------|--------------------------------------|---------------------------------------| | ||
| `std::[unordered_]set<T>` | `std::[unordered_]set<T'>` | | | ||
| | `std::[unordered_]map<K',V'>` | only `T` = `std::[pair,tuple]<K,V>` | | ||
|----------------------------------|--------------------------------------|---------------------------------------| | ||
| `std::[unordered_]multiset<T>` | `ROOT::vector<T'>` | | | ||
| | `std::array<T', N>` | | | ||
| | `std::[unordered_][multi]set<T'>` | | | ||
| | `std::[unordered_][multi]map<K',V'>` | only `T` = `std::[pair,tuple]<K,V>` | | ||
| | User-defined collection of `T'` | | | ||
| | Untyped collection of `T'` | | | ||
|----------------------------------|--------------------------------------|---------------------------------------| | ||
| `std::[unordered_]map<K,V>` | `std::[unordered_]map<K',V'>` | | | ||
| | `std::[unordered_]set<T>` | only `T` = `std::[pair,tuple]<K',V'>` | | ||
|----------------------------------|--------------------------------------|---------------------------------------| | ||
| `std::[unordered_]multimap<K,V>` | `ROOT::vector<T>` | only `T` = `std::[pair,tuple]<K,V>` | | ||
| | `std::array<T, N>` | only `T` = `std::[pair,tuple]<K,V>` | | ||
| | `std::[unordered_][multi]set<T>` | only `T` = `std::[pair,tuple]<K,V>` | | ||
| | `std::[unordered_][multi]map<K',V'>` | | | ||
| | User-defined collection of `T` | only `T` = `std::[pair,tuple]<K,V>` | | ||
| | Untyped collection of `T` | only `T` = `std::[pair,tuple]<K,V>` | | ||
|
||
#### Nullable fields | ||
|
||
| In-memory type | Compatible on-disk types | | ||
| -------------------- | ------------------------ | | ||
| `std::optional<T>` | `std::unique_ptr<T'>` | | ||
| | `T'` | | ||
|----------------------|--------------------------| | ||
| `std::unique_ptr<T>` | `std::optional<T'>` | | ||
| | `T'` | | ||
|
||
#### Records | ||
|
||
| In-memory type | Compatible on-disk types | | ||
| --------------------------- | -------------------------------------- | | ||
| `std::pair<T,U>` | `std::tuple<T',U'>` | | ||
|-----------------------------|----------------------------------------| | ||
| `std::tuple<T,U>` | `std::pair<T',U'>` | | ||
|-----------------------------|----------------------------------------| | ||
| Untyped record | User-defined class of compatible shape | | ||
|
||
Note that for emulated classes, the in-memory untyped record is constructed from on-disk information. | ||
|
||
#### Additional rules | ||
|
||
All on-disk types `std::atomic<T'>` can be read into a `T` in-memory model. | ||
|
||
If a class property changes from using an RNTuple streamer field to a using regular RNTuple class field, | ||
existing files with on-disk streamer fields will continue to read as streamer fields. | ||
This can be seen as "schema evolution out of streamer fields". | ||
|
||
## Manual schema evolution (I/O customization rules) | ||
|
||
ROOT I/O customization rules allow for custom code handling the transformation | ||
from the on-disk schema to the in-memory model. | ||
Customization rules are part of the class dictionary. | ||
For the exact syntax of customization rules, we refer to the ROOT manual. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe a link to the manual here? |
||
|
||
Generally, customization rules consist of | ||
- A target class. | ||
- Target members of the target class, i.e. those class members whose value is set by the rule. | ||
Target members must be direct members, i.e. not part of a base class. | ||
- A source class (possibly having a different class name than the target class) | ||
together with class versions or class checksums | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It could be useful to mention that the class checksum can be retrieved from |
||
that describe all the possible on-disk class versions the rule applies to. | ||
- Source members of the source class; the given source members will be read as the given type. | ||
Source members can also be from a base class. | ||
Note that there is no way to specify a base class member that has the same name as a member in the derived class. | ||
- The custom code snippet; the code snippet has access to the (whole) target object and to the given source members. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think an example of a customization rule would be very useful to anyone not already familiar with them. |
||
|
||
At runtime, for any given target member there must be at most be one applicable rule. | ||
A source member can be read them into any type compatible to its on-disk type | ||
but any given source member can only be read into one type for a given target class | ||
(i.e. multiple rules for the same target/source class must not use different types for the same source member). | ||
|
||
There are two special types of rules | ||
1. Pure class rename rules consisting only of source and target class | ||
2. Whole-object rules that have no target members | ||
|
||
Class rename rules (pure or not) are not transitive | ||
(if in-memory `A` can read from on-disk `B` and in-memory `B` can read from no-disk `C`, | ||
in-memory `A` can not automatically read from on-disk `C`). | ||
|
||
Note that customization rules operate on partially read objects. | ||
Customization rules are executed after all members not subject to customization rules have been read from disk. | ||
Whole-object rules are executed after other rules. | ||
Otherwise, the scheduling of rules is unspecified. | ||
|
||
## Interplay between automatic and manual schema evolution | ||
|
||
The target members of I/O customization rules are exempt from automatic schema evolution | ||
(applies to the corresponding field of the target member and all its subfields). | ||
Otherwise, automatic and manual schema evolution work side by side. | ||
For instance, a renamed class is still subject to automatic schema evolution. | ||
|
||
The source member of a customization rule is subject to the same automatic and manual schema evolution rules | ||
as if it was normally read, e.g. in an `RNTupleView`. | ||
|
||
## Schema evolution differences between RNTuple and Classic I/O | ||
|
||
In contrast to RNTuple, TTree and TFile apply also the following automatic schema evolution rules | ||
- Conversion between floating point and integer types | ||
- Conversion from `unique_ptr<T>` --> `T'` | ||
- Complete conversion matrix of all collection types | ||
- Insertion and removal of intermediate classes | ||
- Move of a member between base class and derived class | ||
- Reordering of base classes | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we specify in the comment that new members are default initialized?