Add Absolute Positional Encoding in Dataloader #7

jacobbieker · 2021-10-14T16:25:21Z

Pull Request

Description

This adds absolute position encoding in the dataloader part, so that it will generate position encodings for the past and future timesteps, which can then be used downstream in the models for querying the output from Perciever IO

Fixes #4

Somewhat relates to openclimatefix/nowcasting_dataset#229 in terms of how data is stored on disk, and loaded into a format. As such, this PR is somewhat blocked until that PR is merged.

How Has This Been Tested?

Unit tests

No
Yes

Checklist:

My code follows OCF's coding style guidelines
I have performed a self-review of my own code
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
I have checked my code and corrected any misspellings

JackKelly · 2021-10-20T11:43:23Z

Sounds good! Just to check: is the plan to move the position encoding out of nowcasting_utils into nowcasting_dataloader? (That sounds good to me!)

jacobbieker · 2021-10-20T11:49:05Z

Sounds good! Just to check: is the plan to move the position encoding out of nowcasting_utils into nowcasting_dataloader? (That sounds good to me!)

There isn't any position encoding in nowcasting_utils at the moment, I don't think. This is just actually calling the position encoding being added in #2 to the dataloader. There is one other place that does Fourier encodings, in perceiver-pytorch and the LearnableQuery, but I think those are probably fine where they are? They only do relative encoding and position encoding within each modality separately, while this does absolute position encoding across all examples and times.

JackKelly · 2021-10-20T11:51:23Z

There isn't any position encoding in nowcasting_utils at the moment, I don't think.

Oops, sorry, I clearly haven't had enough coffee yet! I keep getting confused this morning!

I think those are probably fine where they are?

Yeah, I agree: might as well leave them there (at least, for now!)

Issues on merging the spatial and temporal values together, the x and y shapes do not match Spatial ones are the 32 ID x and y coords in an array, and the actual spatial features would be along the diagonal. So should just need to slice that and make the spatial one smaller

Have to add support for 4D tensor as well now to support GSP and PV correctly

nowcasting_dataloader/data_sources/satellite/satellite_model.py

peterdudfield · 2021-10-22T09:50:34Z

nowcasting_dataloader/utils/position_encoding.py


 TIME_DIM = 2
 HEIGHT_DIM = 3
 WIDTH_DIM = 4
+# For GSP and PV, have an ID dimension
+ID_DIM = 3


could you work these out from the xr.Dataset?

I don't know if it would be easy to get from the xr.Dataset? The plan with these would be to replace them with NamedTensors when those are fully supported.

nowcasting_dataloader/utils/position_encoding.py

tests/test_position_encoding.py

flowirtz

Two style comments, apart from that LGTM 👍

nowcasting_dataloader/utils/position_encoding.py

JackKelly

LGTM! Excited to see how this stuff improves model performance!

nowcasting_dataloader/utils/position_encoding.py

JackKelly

one tiny comment. LGTM!

JackKelly · 2021-10-22T10:54:04Z

nowcasting_dataloader/utils/position_encoding.py

+    x_max = -np.inf
+    y_min = np.inf
+    y_max = -np.inf
+    for lat in [15, 70]:


Further down the line, it might be nice if users could specify geographic bounds in nowcasting_dataset config; and then those geographic bounds would be saved to disk in the config yaml and used here. But that's definitely for another PR; and definitely not super-important! I've started a new issue: openclimatefix/nowcasting_dataset#266

Sounds good!

jacobbieker added the enhancement New feature or request label Oct 14, 2021

jacobbieker self-assigned this Oct 14, 2021

This was referenced Oct 14, 2021

Add Absolute Position Encoder #2

Merged

Move Nowcasting-dataset dataloaders/datamodules/BatchML #9

Merged

jacobbieker force-pushed the jacob/positional-dataloader branch from 0366aa7 to 2162acc Compare October 20, 2021 10:30

jacobbieker force-pushed the jacob/positional-dataloader branch from 2905da7 to 3072747 Compare October 20, 2021 13:19

Add stub version of encoding for Batch

03b5cb2

jacobbieker force-pushed the jacob/positional-dataloader branch from 3072747 to 03b5cb2 Compare October 20, 2021 13:22

Simplify the encodings for the batch

4e6d1de

JackKelly mentioned this pull request Oct 20, 2021

Compute datetime features on-the-fly in nowcasting_dataloader. Remove datetime features from the on-disk batches. openclimatefix/nowcasting_dataset#208

Closed

jacobbieker added 11 commits October 20, 2021 14:48

Add in SEVIRI RSS OSGB bounds

400fe96

Topographic works

04583e2

Add getting diagonal

3dbae74

Have to add support for 4D tensor as well now to support GSP and PV correctly

Fix position encodings for GSP,PV

75229c8

Remove prints

1e10c20

Add asserts to test

a5c29c4

Add position encodings to dataloader

5739af4

Change tests

6b73a06

Fix updating Batch dictionary

af152ba

Update dataset to fix #25

560fd86

jacobbieker marked this pull request as ready for review October 22, 2021 09:47

jacobbieker requested review from peterdudfield, JackKelly and flowirtz October 22, 2021 09:47