Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
210 changes: 210 additions & 0 deletions tree/ntuple/doc/SchemaEvolution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
# Schema Evolution

Schema evolution is the capability of the ROOT I/O to read data
into in-memory models that are different but compatible to the on-disk schema.

Schema evolution allows for data models to evolve over time
such that old data can be read into current models ("backward compatibility")
and old software can read newer data models ("forward compatibility").
For instance, data model authors may over time add and reorder class members, change data types
(e.g. `std::vector<float>` --> `ROOT::RVec<double>`), rename classes, etc.

ROOT applies automatic schema evolution rules for common, safe and unambiguous cases.
Users can complement the automatic rules by manual schema evolution ("I/O customization rules")
where custom code snippets implement the transformation logic.
In case neither automatic nor any of the provided I/O customization rules suffice
to transform the on-disk schema into the in-memory model, ROOT will error out and refrain from reading data.

This document describes schema evolution support implemented in RNTuple.
For the most part, schema evolution works identical across the different ROOT I/O systems (TFile, TTree, RNTuple).
The exceptions are listed in the last section of this document.

## Automatic schema evolution

ROOT applies a number of rules to read data transparently into in-memory models
that are not an exact match to the on-disk schema.
The automatic rules apply recursively to compound types (classes, tuples, collections, etc.);
the outer types are evolved before the inner types.

Automatic schema evolution rules transform native _types_ as well as the _shape_ of user-defined classes
as listed in the following, exhaustive tables.

### Class shape transformations

User-defined classes can automatically evolve their layout in the following ways.
Note that users should increase the class version number when the layout changes.
However, for RNTuple automatic rules that is not mandatory;
RNTuple will always compare the current on-disk layout with the in-memory model.

| Layout Change | Also supported in Untyped Records | Comment |
| --------------------------------------- | --------------------------------- | -------------------- |
| Remove member | Yes | Match by member name |
| Add member | Yes | Match by member name |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we specify in the comment that new members are default initialized?

| Reorder members | Yes | Match by member name |
| Remove all base classes | n/a | |
| Add base class(es) where they were none | n/a | |

Reordering and incremental addition or removal of base classes is currently unsupported
but may be supported in future RNTuple versions.

### Type transformations

ROOT transparently reads into in-memory types that are different from but compatible to the on-disk type.
In the following tables, `T'` denotes a type that is compatible to `T`.

#### Plain fields

| In-memory type | Compatible on-disk types | Comment |
| --------------------------- | --------------------------- | ---------------------|
| `bool` | `char` | |
| | `std::[u]int[8,16,32,64]_t` | |
| | enum | |
|-----------------------------|-----------------------------|----------------------|
| `char` | `bool` | |
| | `std::[u]int[8,16,32,64]_t` | with bounds check |
| | enum | with bounds check |
|-----------------------------|-----------------------------|----------------------|
| `std::[u]int[8,16,32,64]_t` | `bool` | |
| | `char` | |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we should specify the behavior when the value doesn't fit in the char

| | `std::[u]int[8,16,32,64]_t` | with bounds check |
| | enum | with bounds check |
|-----------------------------|-----------------------------|----------------------|
| enum | enum of different type | with bounds check |
|-----------------------------|-----------------------------|----------------------|
| float | double | |
|-----------------------------|-----------------------------|----------------------|
| double | float | |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No bounds check?

|-----------------------------|-----------------------------|----------------------|
| `std::atomic<T>` | `T'` | |


#### Variable-length collections

| In-memory type | Compatible on-disk types | Comment |
| -------------------------------- | ------------------------------------ | ------------------------------------- |
| `std::vector<T>` | `ROOT::RVec<T'>` | |
| | `std::array<T', N>` | |
| | `std::[unordered_][multi]set<T'>` | |
| | `std::[unordered_][multi]map<K',V'>` | only `T` = `std::[pair,tuple]<K,V>` |
| | `std::optional<T'>` | |
| | `std::unique_ptr<T'>` | |
| | User-defined collection of `T'` | |
| | Untyped collection of `T'` | |
|----------------------------------|--------------------------------------|---------------------------------------|
| `std::RVec<T>` | `ROOT::vector<T'>` | with size check |
| | `std::array<T', N>` | with size check |
| | `std::[unordered_][multi]set<T'>` | with size check |
| | `std::[unordered_][multi]map<K',V'>` | only `T` = `std::[pair,tuple]<K,V>`, |
| | | with size check |
| | `std::optional<T'>` | |
| | `std::unique_ptr<T'>` | |
| | User-defined collection of `T'` | with size check |
| | Untyped collectionof `T'` | with size check |
|----------------------------------|--------------------------------------|---------------------------------------|
| `std::[unordered_]set<T>` | `std::[unordered_]set<T'>` | |
| | `std::[unordered_]map<K',V'>` | only `T` = `std::[pair,tuple]<K,V>` |
|----------------------------------|--------------------------------------|---------------------------------------|
| `std::[unordered_]multiset<T>` | `ROOT::vector<T'>` | |
| | `std::array<T', N>` | |
| | `std::[unordered_][multi]set<T'>` | |
| | `std::[unordered_][multi]map<K',V'>` | only `T` = `std::[pair,tuple]<K,V>` |
| | User-defined collection of `T'` | |
| | Untyped collection of `T'` | |
|----------------------------------|--------------------------------------|---------------------------------------|
| `std::[unordered_]map<K,V>` | `std::[unordered_]map<K',V'>` | |
| | `std::[unordered_]set<T>` | only `T` = `std::[pair,tuple]<K',V'>` |
|----------------------------------|--------------------------------------|---------------------------------------|
| `std::[unordered_]multimap<K,V>` | `ROOT::vector<T>` | only `T` = `std::[pair,tuple]<K,V>` |
| | `std::array<T, N>` | only `T` = `std::[pair,tuple]<K,V>` |
| | `std::[unordered_][multi]set<T>` | only `T` = `std::[pair,tuple]<K,V>` |
| | `std::[unordered_][multi]map<K',V'>` | |
| | User-defined collection of `T` | only `T` = `std::[pair,tuple]<K,V>` |
| | Untyped collection of `T` | only `T` = `std::[pair,tuple]<K,V>` |

#### Nullable fields

| In-memory type | Compatible on-disk types |
| -------------------- | ------------------------ |
| `std::optional<T>` | `std::unique_ptr<T'>` |
| | `T'` |
|----------------------|--------------------------|
| `std::unique_ptr<T>` | `std::optional<T'>` |
| | `T'` |

#### Records

| In-memory type | Compatible on-disk types |
| --------------------------- | -------------------------------------- |
| `std::pair<T,U>` | `std::tuple<T',U'>` |
|-----------------------------|----------------------------------------|
| `std::tuple<T,U>` | `std::pair<T',U'>` |
|-----------------------------|----------------------------------------|
| Untyped record | User-defined class of compatible shape |

Note that for emulated classes, the in-memory untyped record is constructed from on-disk information.

#### Additional rules

All on-disk types `std::atomic<T'>` can be read into a `T` in-memory model.

If a class property changes from using an RNTuple streamer field to a using regular RNTuple class field,
existing files with on-disk streamer fields will continue to read as streamer fields.
This can be seen as "schema evolution out of streamer fields".

## Manual schema evolution (I/O customization rules)

ROOT I/O customization rules allow for custom code handling the transformation
from the on-disk schema to the in-memory model.
Customization rules are part of the class dictionary.
For the exact syntax of customization rules, we refer to the ROOT manual.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a link to the manual here?


Generally, customization rules consist of
- A target class.
- Target members of the target class, i.e. those class members whose value is set by the rule.
Target members must be direct members, i.e. not part of a base class.
- A source class (possibly having a different class name than the target class)
together with class versions or class checksums
Copy link
Contributor

@silverweed silverweed Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be useful to mention that the class checksum can be retrieved from TClass::GetCheckSum()

that describe all the possible on-disk class versions the rule applies to.
- Source members of the source class; the given source members will be read as the given type.
Source members can also be from a base class.
Note that there is no way to specify a base class member that has the same name as a member in the derived class.
- The custom code snippet; the code snippet has access to the (whole) target object and to the given source members.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an example of a customization rule would be very useful to anyone not already familiar with them.


At runtime, for any given target member there must be at most be one applicable rule.
A source member can be read them into any type compatible to its on-disk type
but any given source member can only be read into one type for a given target class
(i.e. multiple rules for the same target/source class must not use different types for the same source member).

There are two special types of rules
1. Pure class rename rules consisting only of source and target class
2. Whole-object rules that have no target members

Class rename rules (pure or not) are not transitive
(if in-memory `A` can read from on-disk `B` and in-memory `B` can read from no-disk `C`,
in-memory `A` can not automatically read from on-disk `C`).

Note that customization rules operate on partially read objects.
Customization rules are executed after all members not subject to customization rules have been read from disk.
Whole-object rules are executed after other rules.
Otherwise, the scheduling of rules is unspecified.

## Interplay between automatic and manual schema evolution

The target members of I/O customization rules are exempt from automatic schema evolution
(applies to the corresponding field of the target member and all its subfields).
Otherwise, automatic and manual schema evolution work side by side.
For instance, a renamed class is still subject to automatic schema evolution.

The source member of a customization rule is subject to the same automatic and manual schema evolution rules
as if it was normally read, e.g. in an `RNTupleView`.

## Schema evolution differences between RNTuple and Classic I/O

In contrast to RNTuple, TTree and TFile apply also the following automatic schema evolution rules
- Conversion between floating point and integer types
- Conversion from `unique_ptr<T>` --> `T'`
- Complete conversion matrix of all collection types
- Insertion and removal of intermediate classes
- Move of a member between base class and derived class
- Reordering of base classes