diff --git a/docs/building/concat.yaml b/docs/building/concat.yaml new file mode 100644 index 0000000..4bb37d1 --- /dev/null +++ b/docs/building/concat.yaml @@ -0,0 +1,17 @@ +input: + concat: + - dates: + start: 2020-12-30 00:00:00 + end: 2021-01-01 12:00:00 + frequency: 12h + + source1: + - args + + - dates: + start: 2021-01-02 00:00:00 + end: 2021-01-03 12:00:00 + frequency: 12h + + source2: + - args diff --git a/docs/building/handling_missing_values.rst b/docs/building/handling_missing_values.rst index 93975fb..7898cfd 100644 --- a/docs/building/handling_missing_values.rst +++ b/docs/building/handling_missing_values.rst @@ -1,3 +1,6 @@ ######################### Handling missing values ######################### + +.. literalinclude:: ../../tests/create/nan.yaml + :language: yaml diff --git a/docs/building/operations.rst b/docs/building/operations.rst new file mode 100644 index 0000000..48a9159 --- /dev/null +++ b/docs/building/operations.rst @@ -0,0 +1,53 @@ +.. _dataset-operations: + +############ + Operations +############ + +****** + join +****** + +The join is the process of combining several sources data. Each +source is expected to provide different variables at the same dates. + +.. code-block:: yaml + + input: + join: + - source1 + - source2 + - ... + + +******** + concat +******** + +The concatenation is the process of combining different sets of +operation that handle different dates. This is typically used to +build a dataset that spans several years, when the several sources +are involved, each providing a different period. + +.. literalinclude:: concat.yaml + :language: yaml + + +****** + pipe +****** + +The pipe is the process of transforming fields using filters. The +first step of a pipe is typically a source, a join or another pipe. +The following steps are filters. + + +.. code-block:: yaml + + input: + pipe: + - source + - filter1 + - filter2 + - ... + diff --git a/docs/building/sources/mars.rst b/docs/building/sources/mars.rst index 9c9e3ea..ec1f3a0 100644 --- a/docs/building/sources/mars.rst +++ b/docs/building/sources/mars.rst @@ -17,7 +17,7 @@ the MARS language specification. grid: [0.25, 0.25] Data from several levels types must be requested in separate requests, -with the `join` key. +with the ``join`` command. .. code:: yaml diff --git a/docs/building/statistics.rst b/docs/building/statistics.rst index 8517a3c..c8b2a1b 100644 --- a/docs/building/statistics.rst +++ b/docs/building/statistics.rst @@ -1,25 +1,33 @@ .. _gathering_statistics: -Gathering statistics -==================== +###################### + Gathering statistics +###################### -*Anemoi* will collect statistics about each variables in the dataset as it is created. -These statistics are intended to be used to normalise the data during training. +*Anemoi* will collect statistics about each variables in the dataset as +it is created. These statistics are intended to be used to normalise the +data during training. -By defaults, the statistics are not computed on the whole dataset, but on a subset of -dates. The subset is defined using the following algorythm: +By defaults, the statistics are not computed on the whole dataset, but +on a subset of dates. The subset is defined using the following +algorithm: - - If the dataset covers more than 20 years, the last 3 years are excluded. - - If the dataset covers more than 10 years, the last 2 years are excluded. - - If the dataset covers more than 5 years, the last year is excluded. - - Otherwise, 80% of the dataset is used. + - If the dataset covers more than 20 years, the last 3 years are + excluded. + - If the dataset covers more than 10 years, the last 2 years are + excluded. + - If the dataset covers more than 5 years, the last year is + excluded. + - Otherwise, 80% of the dataset is used. -You can override this behaviour by setting the `statistics_dates` parameter. +You can override this behaviour by setting the `statistics_dates` +parameter. -.. code-block:: yaml +.. code:: yaml - output: - statistics_start: 2000 - statistics_end: 2020 + output: + statistics_start: 2000 + statistics_end: 2020 -.. todo:: List the statistics that are computed +.. + .. todo:: List the statistics that are computed diff --git a/docs/index.rst b/docs/index.rst index 8a2b412..4446107 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -47,6 +47,7 @@ models from existing recipes but with their own data. **Building training datasets** - :doc:`building/introduction` +- :doc:`building/operations` - :doc:`building/sources` - :doc:`building/filters` - :doc:`building/statistics` @@ -57,6 +58,7 @@ models from existing recipes but with their own data. :caption: Building datasets building/introduction + building/operations building/sources building/filters building/naming_variables