Skip to content
This repository was archived by the owner on Sep 28, 2023. It is now read-only.

Add Absolute Positional Encoding in Dataloader #7

Merged
merged 24 commits into from
Oct 22, 2021

Conversation

jacobbieker
Copy link
Member

@jacobbieker jacobbieker commented Oct 14, 2021

Pull Request

Description

This adds absolute position encoding in the dataloader part, so that it will generate position encodings for the past and future timesteps, which can then be used downstream in the models for querying the output from Perciever IO

Fixes #4

Somewhat relates to openclimatefix/nowcasting_dataset#229 in terms of how data is stored on disk, and loaded into a format. As such, this PR is somewhat blocked until that PR is merged.

How Has This Been Tested?

Unit tests

  • No
  • Yes

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

@jacobbieker jacobbieker added the enhancement New feature or request label Oct 14, 2021
@jacobbieker jacobbieker self-assigned this Oct 14, 2021
@jacobbieker jacobbieker force-pushed the jacob/positional-dataloader branch from 0366aa7 to 2162acc Compare October 20, 2021 10:30
@JackKelly
Copy link
Member

Sounds good! Just to check: is the plan to move the position encoding out of nowcasting_utils into nowcasting_dataloader? (That sounds good to me!)

@jacobbieker
Copy link
Member Author

Sounds good! Just to check: is the plan to move the position encoding out of nowcasting_utils into nowcasting_dataloader? (That sounds good to me!)

There isn't any position encoding in nowcasting_utils at the moment, I don't think. This is just actually calling the position encoding being added in #2 to the dataloader. There is one other place that does Fourier encodings, in perceiver-pytorch and the LearnableQuery, but I think those are probably fine where they are? They only do relative encoding and position encoding within each modality separately, while this does absolute position encoding across all examples and times.

@JackKelly
Copy link
Member

JackKelly commented Oct 20, 2021

There isn't any position encoding in nowcasting_utils at the moment, I don't think.

Oops, sorry, I clearly haven't had enough coffee yet! I keep getting confused this morning!

I think those are probably fine where they are?

Yeah, I agree: might as well leave them there (at least, for now!)

@jacobbieker jacobbieker force-pushed the jacob/positional-dataloader branch from 2905da7 to 3072747 Compare October 20, 2021 13:19
@jacobbieker jacobbieker force-pushed the jacob/positional-dataloader branch from 3072747 to 03b5cb2 Compare October 20, 2021 13:22
Issues on merging the spatial and temporal values together, the x and y shapes do not match

Spatial ones are the 32 ID x and y coords in an array, and the actual spatial features would be along the diagonal. So should just need to slice that and make the spatial one smaller
Have to add support for 4D tensor as well now to support GSP and PV correctly
@jacobbieker jacobbieker marked this pull request as ready for review October 22, 2021 09:47

TIME_DIM = 2
HEIGHT_DIM = 3
WIDTH_DIM = 4
# For GSP and PV, have an ID dimension
ID_DIM = 3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you work these out from the xr.Dataset?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it would be easy to get from the xr.Dataset? The plan with these would be to replace them with NamedTensors when those are fully supported.

Copy link
Collaborator

@flowirtz flowirtz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two style comments, apart from that LGTM 👍

Copy link
Member

@JackKelly JackKelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Excited to see how this stuff improves model performance!

Copy link
Member

@JackKelly JackKelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one tiny comment. LGTM!

x_max = -np.inf
y_min = np.inf
y_max = -np.inf
for lat in [15, 70]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further down the line, it might be nice if users could specify geographic bounds in nowcasting_dataset config; and then those geographic bounds would be saved to disk in the config yaml and used here. But that's definitely for another PR; and definitely not super-important! I've started a new issue: openclimatefix/nowcasting_dataset#266

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Add PV/Unified Position encoding
4 participants