Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

made resizing operations compatible with None #20

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sayakpaul
Copy link
Owner

@sayakpaul sayakpaul commented Oct 23, 2022

@amyeroberts

Things are progressing in a new good direction, it seems.

I open-sourced the implementation and porting repository. Hence this PR.

I could make the resizing layers handle arbitrary shapes with your advice and guidance. But as you had correctly pointed out, things are still breaking for ResidualSplitHeadMultiAxisGmlpLayer (here).

I guess we might need to rewrite ResidualSplitHeadMultiAxisGmlpLayer subclassing from tf.keras.Model.

WDYT?

Thank you for all your help so far. I really appreciate it.

@sayakpaul
Copy link
Owner Author

@amyeroberts, some more findings here.

Refer to this layer once:

def GetSpatialGatingWeights(

It has similar calculations for which (None, None, 3) as an input resolution will break things (here and here). But if I try to make this block a subclass of tf.keras.Model, there are some critical challenges:

  • The number of hidden units for this Dense layer is dependent on an intermediate block calculation (here). Same goes for block MLP weights too (here).
  • If we initialize these Dense layers inside the call() method of tf.keras.Model, we won't be able to port the pre-trained parameters from the available JAX models appropriately. There are multiple cases like this in this module.

Any further suggestions based on these observations?

Let me know if anything is unclear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant