Skip to content
This repository has been archived by the owner on Dec 20, 2024. It is now read-only.

Feature/22 feature add tendency training #23

Draft
wants to merge 10 commits into
base: develop
Choose a base branch
from

Conversation

Rilwan-Adewoyin
Copy link
Member

@Rilwan-Adewoyin Rilwan-Adewoyin commented Aug 14, 2024

Closes ecmwf/anemoi-core#39

Idea:

In the AIFS we predict the residual of the state, i.e. delta = x(t-1) - x(t). The residual can however have vastly different scales depending on the variable. Therefore the idea is to predict the normalized changes (tendencies).

Find a summary including some first results here:
https://confluence.ecmwf.int/display/~ecm1922/AIFS+with+normalized+tendencies

Describe your changes

< List any dependencies that are required for this change. >
I had to make PRs/new branches at the following urls:

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Checklist before requesting a review

  • I have performed a self-review of my code
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation and docstrings to reflect the changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have ensured that the code is still pip-installable after the changes and runs
  • I have not introduced new dependencies in the inference partion of the model
  • I have ran this on single GPU
  • I have ran this on multi-GPU or multi-node
  • I have ran this to work on LUMI (or made sure the changes work independently.)
  • I have ran the Benchmark Profiler against the old version of the code

Tag possible reviewers

@clessig

@Rilwan-Adewoyin Rilwan-Adewoyin self-assigned this Aug 14, 2024
@FussyDuck
Copy link

FussyDuck commented Aug 14, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ jakob-schloer
❌ Rilwan-Adewoyin
You have signed the CLA already but the status is still pending? Let us recheck it.

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.82%. Comparing base (af4d9dc) to head (7d973a4).
Report is 2 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop      ecmwf/anemoi-models#23   +/-   ##
========================================
  Coverage    99.82%   99.82%           
========================================
  Files           21       21           
  Lines         1158     1158           
========================================
  Hits          1156     1156           
  Misses           2        2           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -96,8 +97,15 @@ def _build_model(self) -> None:
config=self.config, data_indices=self.data_indices, graph_data=self.graph_data
)

# Use the forward method of the model directly
self.forward = self.model.forward
def forward(self, x: torch.Tensor, model_comm_group: Optional[ProcessGroup] = None) -> torch.Tensor:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make sense to have this better visible, e.g. in the name here? Even if we haven't the alternativ implemented at the moment (but which makes sense to have for anemoi, e.g. with obs the residual is typically not possible).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I think instead of self.tendency_mode there should be a self.prediction_mode which can take the values 'state', 'residual' or 'tendency'. I will try to change that when we open the PR with anemoi-training.

@JesperDramsch
Copy link
Member

This should probably be its own interface, rather than replace the working default one, right?

@JesperDramsch JesperDramsch added the enhancement New feature or request label Sep 23, 2024
@JesperDramsch JesperDramsch marked this pull request as draft September 23, 2024 12:36
Comment on lines 133 to 146
assert (
len(batch.shape) == 4
), f"The input tensor has an incorrect shape: expected a 4-dimensional tensor, got {batch.shape}!"
x = self.pre_processors_state(batch[:, 0 : self.multi_step, ...], in_place=False)

# Dimensions are
# batch, timesteps, horizonal space, variables
x = batch[:, 0 : self.multi_step, None, ...] # add dummy ensemble dimension as 3rd index
# batch, timesteps, horizontal space, variables
x = x[..., None, :, :] # add dummy ensemble dimension as 3rd index
if self.prediction_strategy == "tendency":
tendency_hat = self(x)
y_hat = self.add_tendency_to_state(x[:, -1, ...], tendency_hat)
else:
y_hat = self(x)
y_hat = self.post_processors_state(y_hat, in_place=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These multiple if self.prediction_strategy == "tendency" are adding complexity in the main (and currently unique) AnemoiModelInterface class. This will create technical debt and make the code less flexible and more difficult to maintain.

If you want an alternative, this is what you could do:

Since the current AnemoiModelInterface does not have the behaviour you want, you can instead to create another class that behave as you like (copy-paste the whole AnemoiModelInterface in to MyTendencyTrainingAnemoiModelInterface) and use this one instead.

Just after copy-pasting, you realise that you have duplicated code between AnemoiModelInterface and MyTendencyTrainingAnemoiModelInterface. To avoid this, you can put the common code into a mother class BaseAnemoiModelInterface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that @JesperDramsch had the same opinion on this #23 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I’ve already worked on additional interfaces for a different purpose on a local branch, so this should be straightforward to implement. However, if we split the interface, we’ll likely need to divide the current "AnemoiLightningModule" in anemoi-training into two separate modules: one for ForecastingState (ForecastingStateLightningModule) and another for ForecastingTendency (ForecastingTendencyLightningModule).

At the moment, the existing AnemoiLightningModule (which is tied to the current interface) contains logic for handling both state and tendency steps forward.

@jakob-schloer jakob-schloer self-assigned this Nov 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Add tendency training
7 participants