Skip to content
This repository has been archived by the owner on Jan 10, 2025. It is now read-only.

Commit

Permalink
work on doc
Browse files Browse the repository at this point in the history
  • Loading branch information
b8raoult committed Mar 24, 2024
1 parent 7f0be07 commit e91e2a2
Show file tree
Hide file tree
Showing 7 changed files with 92 additions and 23 deletions.
2 changes: 1 addition & 1 deletion docs/building/filters.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. _dataset-filters:
.. _filters:

#########
Filters
Expand Down
10 changes: 6 additions & 4 deletions docs/building/introduction.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. _datasets-building:
.. _building-introduction:

##############
Introduction
Expand Down Expand Up @@ -41,14 +41,13 @@ source
The `source` is a software component that given a list of dates and
variables will return the corresponding fields. A example of source
is ECMWF's MARS archive, a collection of GRIB or NetCDF files, a
database, etc. See :ref:`dataset-sources` for more information.
database, etc. See :ref:`sources` for more information.

filter
A `filter` is a software component that takes as input the output of
a source or the output of another filter can modify the fields and/or
their metadata. For example, typical filters are interpolations,
renaming of variables, etc. See :ref:`dataset-filters` for more
information.
renaming of variables, etc. See :ref:`filters` for more information.

************
Operations
Expand All @@ -72,6 +71,9 @@ concat
build a dataset that spans several years, when the several sources
are involved, each providing a different period.

Each operation is considered as a :ref:`source <sources>`, therefore
operations can be combined to build complex datasets.

*****************
Getting started
*****************
Expand Down
31 changes: 15 additions & 16 deletions docs/building/operations.rst
Original file line number Diff line number Diff line change
@@ -1,42 +1,41 @@
.. _dataset-operations:
.. _operations:

############
Operations
############

Operations are blocks of YAML code that translates a list of dates into
fields.

******
join
******

The join is the process of combining several sources data. Each
source is expected to provide different variables at the same dates.
The join is the process of combining several sources data. Each source
is expected to provide different variables at the same dates.

.. literalinclude:: input.yaml
:language: yaml


:language: yaml

********
concat
********

The concatenation is the process of combining different sets of
operation that handle different dates. This is typically used to
build a dataset that spans several years, when the several sources
are involved, each providing a different period.
operation that handle different dates. This is typically used to build a
dataset that spans several years, when the several sources are involved,
each providing a different period.

.. literalinclude:: concat.yaml
:language: yaml

:language: yaml

******
pipe
******

The pipe is the process of transforming fields using filters. The
first step of a pipe is typically a source, a join or another pipe.
The following steps are filters.

The pipe is the process of transforming fields using filters. The first
step of a pipe is typically a source, a join or another pipe. The
following steps are filters.

.. literalinclude:: pipe.yaml
:language: yaml
:language: yaml
2 changes: 1 addition & 1 deletion docs/building/sources.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. _dataset-sources:
.. _sources:

#########
Sources
Expand Down
10 changes: 9 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,20 @@
*Anemoi* is a framework for developing machine learning weather
forecasting models. It comprises of components or packages for preparing
training datasets, conducting ML model training and a registry for
datasets and trained models. Anemoi provides tools for operational
datasets and trained models. *Anemoi* provides tools for operational
inference, including interfacing to verification software. As a
framework it seeks to handle many of the complexities that
meteorological organisations will share, allowing them to easily train
models from existing recipes but with their own data.

An *Anemoi dataset* is a thin wrapper around a zarr_ store that is
optimised for training data-driven weather forecasting models. It is
organised in such a way that I/O operations are minimised.

This documentation is divided into two main sections: :ref:`how to use
existing datasets <using-introduction>` and :ref:`how to build new
datasets <building-introduction>`.

- :doc:`overview`
- :doc:`installing`
- :doc:`firststeps`
Expand Down
2 changes: 2 additions & 0 deletions docs/overview.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _overview:

##########
Overview
##########
Expand Down
58 changes: 58 additions & 0 deletions docs/using/introduction.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,61 @@
.. _using-introduction:

##############
Introduction
##############

.. warning::

The code below still mentions the old name of the package,
`ecml_tools`. This will be updated once the package is renamed to
`anemoi-datasets`.

An *Anemoi* dataset is a thin wrapper around a zarr_ store that is
optimised for training data-driven weather forecasting models. It is
organised in such a way that I/O operations are minimised (see
:ref:`overview`).

.. _zarr: https://zarr.readthedocs.io/

To open a dataset, you can use the `open_dataset` function.

.. code:: python
from anemoi_datasets import open_dataset
ds = open_dataset("path/to/dataset.zarr")
You can then access the data in the dataset using the `ds` object as if
it was a NumPy array.

.. code:: python
print(ds.shape)
print(len(ds))
print(ds[0])
print(ds[10:20])
One of the main feature of the *anemoi-datasets* package is the ability
to subset or combine datasets.

.. code:: python
from anemoi_datasets import open_dataset
ds = open_dataset("path/to/dataset.zarr", start=2000, end=2020)
In that case, a dataset is created that only contains the data between
the years 2000 and 2020. Combining is done by passing multiple paths to
the `open_dataset` function:

.. code:: python
from anemoi_datasets import open_dataset
ds = open_dataset("path/to/dataset1.zarr", "path/to/dataset2.zarr")
In the latter case, the datasets are combined along the time dimension
or the variable dimension depending on the datasets structure.

0 comments on commit e91e2a2

Please sign in to comment.