Interpolation model example #1149

jleinonen · 2025-10-08T14:53:09Z

PhysicsNeMo Pull Request

Adds an example of training a temporal interpolation model withphysicsnemo.models.afno.ModAFNO to examples/weather/temporal_interpolation/. See the README.md located there for details.

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.

Dependencies

CharlelieLrt

First pass on the PR.
Overall looks fine, except a few things that are a little unclear here and there.
My biggest concern is the Trainer. I know this class is used in many of our examples, but here I don't really see the point. I would support having a dedicated abstraction for a trainer only if you expect users to import it into their own codebase and use it in their own external application.

CharlelieLrt · 2025-10-08T22:15:09Z

examples/weather/temporal_interpolation/README.md

+install it by running
+
+```bash
+pip install mlflow


Can you add a requirements.txt at the root dir of this example and put mlflow with minimum version constraint that you used.

requirements.txt is now included, along with a note in the README to install dependencies using it.

CharlelieLrt · 2025-10-08T22:20:11Z

examples/weather/temporal_interpolation/README.md

+This PhysicsNeMo example shows how to train a ModAFNO-based temporal interpolation model
+with a custom dataset. For access to the pre-trained model, see the [wrapper in
+Earth2Studio](https://nvidia.github.io/earth2studio/modules/generated/models/px/earth2studio.models.px.InterpModAFNO.html#earth2studio.models.px.InterpModAFNO).
+A technical description of the model can be found in the paper ["Modulated Adaptive
+Fourier Neural Operators for Temporal Interpolation of Weather
+Forecasts"](https://arxiv.org/abs/2410.18904).


Can you be more specific a give a few more lines of technical background there?
As a potential user I would like to have a short paragraph that gives me the big technical picture of what the model can do, without having to go to the paper.

For example, one question I have is whether this is a deterministic interpolation model or a probabilistic one?
If it's deterministic, isn't it a shortcoming?
If I want to apply it to a higher temporal super-resolution ratio, a fully deterministic model is most likely not going to work, correct?
Another point that it might be worth clarifying here:
How many snapshots after/before the targeted interpolation time is the model using?

Added a longer description in ae6eed1.

CharlelieLrt · 2025-10-08T22:21:18Z

examples/weather/temporal_interpolation/README.md

+A technical description of the model can be found in the paper ["Modulated Adaptive
+Fourier Neural Operators for Temporal Interpolation of Weather
+Forecasts"](https://arxiv.org/abs/2410.18904).
+


We need a (preferably nice) picture with the results provided by the trained model

Picture added in 5430eb4.

CharlelieLrt · 2025-10-08T22:22:13Z

examples/weather/temporal_interpolation/README.md

+  containing a 1D array with length equal to the number of variables in the dataset,
+  with each value giving the mean (for `global_means.npy`) or standard deviation (for
+  `global_stds.npy`) of the corresponding variable.
+* A JSON file with metadata about the contents of the HDF5 files. See [here](data.json)


This link: [here](data.json) is going to break in the online docs. Use the full github url instead. This applies to all the links below.

How do I get the correct URL it'll have on Github after it's merged?

For this file it should be something like https://github.com/NVIDIA/physicsnemo/blob/main/examples/weather/temporal_interpolation/data.json

Links changed to full URLs in ae6eed1.

CharlelieLrt · 2025-10-08T22:25:06Z

examples/weather/temporal_interpolation/README.md

+dataset, you may also need to change the `model.in_channels` and `model.out_channels`
+parameters.
+
+## Starting training


I think we also need a script to validate the trained model. Also a corresponding section in this README.md

What do you propose to include in the validation script? If it was my project I would be quite happy having just simple online validation here and prefer to do more extensive validation using Earth2Studio.

Yes I understand that E2S should be the main target for inference, but I also think that PhysicsNeMo examples should be "standalone". AFAIK all our examples have separate validation scripts.

Ideally, it would be good to take 2 timesteps T1 and T2, get all input snapshots between T1 and T2, and super-resolve the trajectory between the two. If that is too complicated, it could be something very simple, like basically repeating what is done in the online validation in train.py, but decoupled from the training itself. Being able to do validation. without relaunching the training script is still nice I think.

examples/weather/temporal_interpolation/config/train_interp.yaml

CharlelieLrt · 2025-10-08T23:27:41Z

examples/weather/temporal_interpolation/config/train_interp.yaml

+  embed_model:
+    dim: 64 # width of time embedding net
+    depth: 1 # depth of time embedding net
+    method: sinusoidal # embedding type used in time embedding net


Other options available? If so list them

Theembed_model config maps directly to

physicsnemo/physicsnemo/models/afno/modembed.py

Lines 88 to 95 in 5a64525

def __init__(

self,

max_time: float = 1.0,

dim: int = 64,

depth: int = 1,

activation_fn: Type[nn.Module] = nn.GELU,

method: str = "sinusoidal",

):

and I think these are the only options it makes sense to expose to the config. I added a mention of the possible values of method in f56991e, though.

Oh I just meant for the method: sinusoidal, are there any other options? But it seems it's the only option accepted. So maybe add a comment "Only supports 'sinusoidal' for now".

It now says

method: sinusoidal # embedding type used in time embedding net, "sinusoidal" or "learned"

CharlelieLrt · 2025-10-08T23:28:50Z

examples/weather/temporal_interpolation/config/train_interp.yaml

+  max_epoch: 120 # number of data "epochs"
+  samples_per_epoch: 50000 # number of samples per "epoch"


I don't understand these 2 fields. Is it not looping over the entire dataloader at every epoch?

Yes, "epochs" are basically fictional, they just indicate how often we save checkpoints. Could reword this if it feels confusing.

If it's just the frequency to save checkpoints, maybe something like that would be more clear?

# I/O configuration io: checkpoint_freq: 10 log_freq: 10

Sorry, I have misstated this a bit in the previous comment. Actually after each "epoch" we do these chores:

Save checkpoint

Run validation

Update learning rate

We could add options to time all these separately, but it would add more complexity and the refactoring would potentially introduce bugs, so I'd prefer to avoid it. I did add a comment in the config file to specify what we mean by "epoch", though.

CharlelieLrt · 2025-10-08T23:30:19Z

examples/weather/temporal_interpolation/config/train_interp.yaml

+training:
+  max_epoch: 120 # number of data "epochs"
+  samples_per_epoch: 50000 # number of samples per "epoch"
+  checkpoint_dir: "/checkpoints/fcinterp/" # location where checkpoints are saved


As a user, how would I restart a training? Is there anything I need to change in this config?

It's the default behavior to restart from the latest checkpoint, but it's probably a good idea to add the option here.

Yeah, this implicit behavior of restarting by default is confusing and it led me to making mistakes before. Usually I prefer to put an explicit flag:

# I/O configuration io: # Whether to load a checkpoint load_checkpoint: false

I added load_epoch: "latest" in the config file as the explicit default, with a comment explaining the options ("latest" will start from scratch if there are no checkpoint files in the directory, so I think it's a good default because we can use the same config options for the first training run and subsequent ones).

CharlelieLrt · 2025-10-08T23:31:10Z

examples/weather/temporal_interpolation/config/train_interp.yaml

+  geopotential_filename: "/data/era5-wind_gust/invariants/orography.nc" # location of orography file
+  lsm_filename: "/data/era5-wind_gust/invariants/land_sea_mask.nc" # location of lsm file
+  use_latlon: True # when True, return latitude and longitude from datapipe
+  num_samples_per_year_train: 8748 # number of training samples per year (8748 == 365 * 24 - 12)


Why the - 12?

The datapipe doesn't support loading data from two annual files. This is to keep the (t, t+6h) data tuple from going out of bounds.

Ok got it. I'm not sure it's possible, but could we do the same as for this comment? That is, just give the raw number of samples in the config and somewhere in the datapipe do the conversion to make sure it doesn't go out of bounds?

I added an option of setting it to null in which case it is determined in the code, and made it the default.

jleinonen and others added 3 commits October 3, 2025 06:35

Temporal interpolation training recipe

b0dbf92

Add README

af84a7b

Merge branch 'NVIDIA:main' into interp-model-example

d9841bc

ram-cherukuri assigned CharlelieLrt Oct 8, 2025

CharlelieLrt requested changes Oct 8, 2025

View reviewed changes

jleinonen added 3 commits October 13, 2025 08:35

Docs changes based on comments

f56991e

Update docstrings and README

ae6eed1

Add temporal interpolation animation

5430eb4

jleinonen requested review from ktangsali and megnvidia as code owners October 14, 2025 12:08

jleinonen and others added 6 commits October 14, 2025 14:08

Merge branch 'NVIDIA:main' into interp-model-example

84289f5

Add animation link

2b2c81e

Add shape check in loss

6f09aa1

Updates of configs + trainer

811a38a

Update config comments

cb23fe6

Merge branch 'NVIDIA:main' into interp-model-example

c32a01b

	def __init__(
	self,
	max_time: float = 1.0,
	dim: int = 64,
	depth: int = 1,
	activation_fn: Type[nn.Module] = nn.GELU,
	method: str = "sinusoidal",
	):

		max_epoch: 120 # number of data "epochs"
		samples_per_epoch: 50000 # number of samples per "epoch"

Interpolation model example #1149

Are you sure you want to change the base?

Interpolation model example #1149

Conversation

jleinonen commented Oct 8, 2025

PhysicsNeMo Pull Request

Description

Checklist

Dependencies

Uh oh!

CharlelieLrt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jleinonen Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jleinonen Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

jleinonen Oct 10, 2025 •

edited

Loading

jleinonen Oct 10, 2025 •

edited

Loading