Add support for arbitrary image resolutions #24

LeviVasconcelos · 2022-12-16T05:26:31Z

Hi,

This PR adds support for arbitrary image resolution. Here's what I did to make it possible:

Rework Image Resizing layers as pointed out here
Rewrite Block and Unblock layers to use pure tensorflow: this was necessary because einops does not accept tensors as pattern arguments, which is necessary in some layers of maxim.
Realize that dim_u and dim_v could be substituted for the window_size squared.
Apply changes to the model itself. (edit: by that I meant substitute the layers for the TF ones, and plug in the necessary arguments for them to work)

I hope this helps, please let me know what you think.

you can test it quickly by: pytest -v maxim/tests/

Best,
Levi.

sayakpaul · 2022-12-16T11:12:58Z

maxim/blocks/grid_gating.py

@@ -6,7 +6,7 @@
 from tensorflow.keras import backend as K
 from tensorflow.keras import layers

-from ..layers import BlockImages, SwapAxes, UnblockImages
+from ..layers import SwapAxes, TFBlockImagesByGrid, TFUnblockImages


Why there's a separate layer for handling blocking by grids?

BlockByGrid can be implemented as follows, please see a more detailed explanation here:

def BlockByGrid(image, grid_size): block_size = (image_height // grid_size[0], image_width // grid_size[1]) return BlockImage(image, block_size)

But, while implementing TFBlockImages I used tf.split which expects an int literal as argument for num_or_size_splits.

However, in cases where we only have the grid_size and the block_size has to be computed on the fly (as here), it needs to be a tensor, and we can't use tf.split ins this case. That's why I also wrote BlockByGrid.

sayakpaul · 2022-12-16T11:14:21Z

maxim/blocks/grid_gating.py

-        x = BlockImages()(x, patch_size=(fh, fw))
+        x, ph, pw = TFBlockImagesByGrid()(x, grid_size=(gh, gw))


How come these operations are the same?

From the original implementation, the authors implement BlockByGrid by computing the block size of a grid cell, and using BlockImages (which block images into patches of block-size).

From the paper, the authors explain the difference between "grid" and "block" like that:

Note that we can achieve the same result as the grid split by forwarding a block size of [3,2] instead. This is exactly what the authors do in the original code as highlighted here.

They are equivalent because it does the split based on the grid_size as argument instead of the block_size (called as (fh, fw) in the code) as the authors did.

A more formal test is performed here.

Thanks for explaining.

Note that we can achieve the same result as the grid split by forwarding a block size of [3,2] instead.

How is the block size of [3, 2] interpreted in that case?

In the original code, it is done as explained, note here:

gh, gw = grid_size fh, fw = h // gh, w // gw u = BlockImages()(u, patch_size=(fh, fw))

Note that this code is very similar to the pseudo-code written here. grid_size is passed as a parameter, but h and whave to be inferred from the image dimensions (which in case of (None, None, 3)), they are None tensors. Thus can't be used in the einops operations, and the way I found to overcome this was to rewrite the operations in tf.

We can use the block [3,2] to compute the green part of the image (grid blocking with grid_size=[3,2]) this way:

In the example shown in the image, we have that image size is [6,4]. Thus to split it with a grid_size of [2,2], we can do:

gh, gw = (2, 2) h, w = (6,4) # image dimensions fh, fw = h // gh, w // gw # Note that fh = 3, and fw = 2 block_image = BlockImages()(image_from_the_piture, patch_size=(fh,fw)) # patch_size=(3,2)

The above code snippet implements the green part of the image, and is very similar to what we described first.

In case with the TFBlockByGrid(), we can simply do:

gh, gw = (2,2) block_image_using_tfblockByGrid = TFBlockByGrid()(image_from_the_picture, grid_size=(gh,gw))

and block_image should be equivalent to block_image_using_tfblockByGrid, as asserted by this test

I am not sure if this answer what you asked, though. Let me know.

Thanks!

So, TFBlockByGrid() becomes more idiomatic in that sense. We want to have grid sizes of (2, 2) in the output so, directly pass that as an argument. Correct?

sayakpaul

Thanks for your hard work. Left a couple of comments.

sayakpaul · 2022-12-16T11:16:03Z

maxim/blocks/grid_gating.py

@@ -66,7 +63,7 @@ def apply(x):
        )(y)
        y = layers.Dropout(dropout_rate)(y)
        x = x + y
-        x = UnblockImages()(x, grid_size=(gh, gw), patch_size=(fh, fw))
+        x = TFUnblockImages()(x, grid_size=(gh, gw), patch_size=(ph, pw))


Same. You're changing the semanticity of the code. Could you please elaborate why?

Reading this change and also previous x, ph, pw = TFBlockImagesByGrid()(x, grid_size=(gh, gw)) and comparing them to their previous versions -- they don't read the same too.

sayakpaul · 2022-12-16T11:17:13Z

maxim/blocks/misc_gating.py

-        dim_u = K.int_shape(u)[-3]
+        ghu, gwu = grid_size
+        u, phu, pwu = TFBlockImagesByGrid()(u, grid_size=(ghu, gwu))
+        dim_u = ghu * gwu


Explain the rationale in the comment.

Also, advisable not to change the original variable names here and elsewhere.

Variable names will be recovered on next push. Did it only for readability (since they get rewritten a few lines below)

If i understood correclty, you are asking why we can substitute K.int_shape(u)[-3] for (gh * gw):

From BlockImages(), we have that the output's shape is "b (gh gw) (fh fw) c". Thus, since:
dim_u = K.int_shape(u)[-3]
dim_u = (gh * gw)

Same reason why fh and fw are getting replaced by gh and gw here?

Essentially, those transformations are the same:

def same_operations(random_image, grid_size=(gh,gw)): b, h, w, c = random_image.shape image_blocked_by_grid = BlockByGrid(random_image, grid_size=(gh, gw)) image_blocked_by_block = BlockByPatch(random_image, patch_size=(h // gh, w // gw) image_blocked_by_grid == image_blocked_by_block # this should be True.

we have this pseudo-code as a test here. Note that BlockImages() used in the test corresponds to the original einops implementation.

sayakpaul · 2022-12-16T11:20:37Z

maxim/layers.py

+    def call(self, image, patch_size):
+        bs, h, w, num_channels = (tf.shape(image)[0], tf.shape(image)[1], tf.shape(image)[2], tf.shape(image)[3])
+        ph, pw = patch_size
+        gh = h // ph
+        gw = w // pw
+        pad = [[0, 0], [0, 0]]
+        patches = tf.space_to_batch_nd(image, [ph, pw], pad)
+        patches = tf.split(patches, ph * pw, axis=0)
+        patches = tf.stack(patches, 3)  # (bs, h/p, h/p, p*p, 3)
+        patches_dim = tf.shape(patches)
+        patches = tf.reshape(patches, [patches_dim[0], patches_dim[1], patches_dim[2], -1])
+        patches = tf.reshape(patches, (patches_dim[0], patches_dim[1] * patches_dim[2], ph * pw, num_channels))
+        return [patches, gh, gw]


I'm honestly not sure why we are getting rid of einops. This is significantly more lines of code and also more complex to read.

Indeed, using einops would be in hand. But please, consider this code snippet:

img = tf.random.uniform((1, 4, 4, 1)) block_img = einops.rearrange(img, 'b (gh fh) (gw fw) c -> b (gh gw) (fh fw) c', fh=2, fw=2) # this should work fine block_img_with_tensors_as_arguments = einops.rearrange(img, 'b (gh fh) (gw fw) c -> b (gh gw) (fh fw) c', fh=tf.constant([2]), fw=tf.constant([2])) # this breaks.

The problem with einops is that it expects int literals as argument to the symbols used in the pattern string. I could not make it work using tensors as shown by the example above. At some stages of the model (here, here, here), the split is computed in online fashion, thus relying on tensors (for the case where the img size is None). Thus it was necessary to rewrite using tensorflow.

But it wasn't a problem with the current version of the code. What changed?

The current version of the code informs the image dimension beforehand, thus when you do:

n, h, w, num_channels = ( K.int_shape(x)[0], K.int_shape(x)[1], K.int_shape(x)[2], K.int_shape(x)[3], )

you have the integer literals we need for the einops operations. However, In case when we feed (None, None, 3) as input, h and w cannot be used for computing direct literals for the einops operations, and they have to be represented as tensorflow placeholders (None tensor).

sayakpaul · 2022-12-16T11:21:29Z

maxim/layers.py

+        bs, h, w, num_channels = (tf.shape(image)[0], tf.shape(image)[1], tf.shape(image)[2], tf.shape(image)[3])
+        ph, pw = patch_size
+        gh = h // ph
+        gw = w // pw
+        pad = [[0, 0], [0, 0]]
+        patches = tf.space_to_batch_nd(image, [ph, pw], pad)
+        patches = tf.split(patches, ph * pw, axis=0)
+        patches = tf.stack(patches, 3)  # (bs, h/p, h/p, p*p, 3)
+        patches_dim = tf.shape(patches)
+        patches = tf.reshape(patches, [patches_dim[0], patches_dim[1], patches_dim[2], -1])
+        patches = tf.reshape(patches, (patches_dim[0], patches_dim[1] * patches_dim[2], ph * pw, num_channels))
+        return [patches, gh, gw]


What is the line number you're using for Black formatting? The line-numbers seem long and should be formatted accordingly.

where I work we use 122, I am reformatting with 88 (black's default, IIRC).

80 is the default. You can bump it to 90 (which is what I used).

sayakpaul · 2022-12-16T11:23:00Z

maxim/layers.py

@@ -76,28 +100,60 @@ def get_config(self):


 @tf.keras.utils.register_keras_serializable("maxim")
-class Resizing(layers.Layer):


What is the need to segregate this to Up and Down?

I found it easier to read, but indeed it adds a chunk of code. Reformatting to use a single layer only.

If it's easier to read, I would consider adding an elaborate comment in the script so that readers are aware.

I think now is better (with a single resizing layer).

sayakpaul · 2022-12-16T11:23:31Z

maxim/layers.py

-        return tf.image.resize(
-            x,
-            size=(self.height, self.width),
+    def __call__(self, img):


Prefer call() or is there anything I am missing out on?

No, fixing...

maxim/maxim.py

sayakpaul · 2022-12-16T11:26:31Z

maxim/maxim.py

-    layers.Conv2D, kernel_size=(4, 4), strides=(2, 2), padding="same"
-)
+ConvT_up = functools.partial(layers.Conv2DTranspose, kernel_size=(2, 2), strides=(2, 2), padding="same")
+Conv_down = functools.partial(layers.Conv2D, kernel_size=(4, 4), strides=(2, 2), padding="same")


 def MAXIM(


What was the main change here needed to facilitate (None, None, 3) input tensors?

Adding the resize layers and piping ratio to them.

Resizing layers were there previously too. Do you mean having ratio instead of separate height and width was the key change?

To this specific file (maxim.py) yes, for the whole PR no.

The key changes were to rewrite the einops operations in pure TF.

Another key change was to make dim_u and dim_v independent of the input image size. Note that, in this PR, dim_u and dim_v are computed from grid_size and block_size which are passed as parameters, as compared to the current version (links: dim_u, dim_v) which rely on on-the-fly computation based on the input image size.

A last change was to plug in the resizing layers with the correct ratio. On your branch feat/dynamic_shape, there's a little bug: when you pass the ratio for the upsampling layers here. This casting to int() is premature, since some of those values are supposed to be < 0, and casting to int will project them to 0.

This casting to int() is premature, since some of those values are supposed to be < 0, and casting to int will project them to 0.

Would appreciate an example.

Consider this and this lines of code from the feat/dynamic-shape branch.

If you have j > i on the second one, you get a below zero fraction which will be projected to zero.
Same goes for the same line: whenever (depth - j - 1 - i) is positive, you have a below zero ratio.
If I recall correctly, the second case (depth - j - 1 - i) was yielding 0.25, 0.5, and so on. Thus I changed it.

The way i did to overcome this, was to actually pass the float number, and compute the new desired image size: img_size * ratio. And just after that converting back to int.

sayakpaul

Thanks for all your explanation. I truly appreciate the hard work here.

A couple more comments.

The immediate next step could be to verify the actual outputs of the reworked models. Let's plan on that.

LeviVasconcelos · 2022-12-17T11:26:19Z

Please let me know if anything remains unclear.

Best,

sayakpaul

Looking good.

sayakpaul · 2022-12-20T03:00:22Z

@LeviVasconcelos

Things are looking good. Thanks so much for your hard work. I left a couple more questions.

I would suggest doing the following:

Including a detailed summary of changes needed to make this work in the README.
Running the conversion script for each model and verifying their outputs.
Creating PRs to each of the MAXIM model repositories (Google) here: https://huggingface.co/models?search=maxim

sayakpaul · 2022-12-29T09:04:08Z

@LeviVasconcelos a friendly ping :)

LeviVasconcelos · 2023-01-03T14:08:14Z

I was in vacation this last week, thus the late reply.

What do you think of a quick call this or next week? I have a couple questions that I think would be quickly answered in a ~10 mins call.

Let me know what you think...

sayakpaul · 2023-01-03T14:26:18Z

Would prefer chatting via email as I will be busy next week.

LeviVasconcelos · 2023-01-10T23:58:27Z

Just a friendly heads up: i will be very busy until jan 20th. Afterward i should start working on it.

Best,

list changes done to achieve arbitrary image shapes.

danwexler · 2023-02-03T21:12:52Z

Great to see the update. I'm eager to test out this code. My hope is to get it working in TFJS. I'm guessing that may require the implementation of a few operations, similar to what was done to get it working for arbitrary resolutions? Any tips or suggestions appreciated.

sayakpaul · 2023-02-03T23:35:31Z

Great to know it however I don't about TFJS :(

Maybe reach out to Jason Meyes?

LeviVasconcelos · 2023-07-02T05:20:13Z

Hi @sayakpaul ,

sorry for the delay, life got in the way =/.

@LeviVasconcelos

Things are looking good. Thanks so much for your hard work. I left a couple more questions.

I would suggest doing the following:
* Including a detailed summary of changes needed to make this work in the README.

I uploaded the README file explaining in details the changes done.

* Running the [conversion script](https://github.com/sayakpaul/maxim-tf/blob/main/convert_to_tf.py) for each model and verifying their outputs.

I ran convert_to_tf for 5 different models, using as checkpoints the models provided in gs://gresearch/maxim/ckpt . The results of run_eval.py for each model can be found here.

I also modified run_eval.py by removing the dynamic_resize flag.

* Creating PRs to each of the MAXIM model repositories (Google) here: https://huggingface.co/models?search=maxim

Should I create the PRs right away? Or should we merge this first?

LeviVasconcelos · 2023-07-07T02:05:05Z

@sayakpaul friendly ping.

sayakpaul · 2023-07-07T03:36:21Z

Hey thanks for your hardwork!

Could you be so kind to remind me about this again in maybe 1 week? A little busy right now.

LeviVasconcelos · 2023-07-18T21:50:47Z

@sayakpaul pinging as requested ;)

rogeriofonteles · 2023-08-18T18:27:55Z

Hey guys, any update on this?

LeviVasconcelos · 2023-09-18T18:53:57Z

Gently pinging @sayakpaul here.

fcmr · 2023-09-28T12:35:37Z

Hi! Any news on this PR?

LeviVasconcelos added 2 commits December 16, 2022 02:08

add: support for dynamic input shapes.

30f9393

add: tests for dynamic input shape

d55b45a

sayakpaul reviewed Dec 16, 2022

View reviewed changes

sayakpaul requested changes Dec 16, 2022

View reviewed changes

LeviVasconcelos added 2 commits December 16, 2022 14:16

refactor: address review comments

af09c34

refactor: remove ResizingUp and ResizingDown imports

cd13378

sayakpaul reviewed Dec 17, 2022

View reviewed changes

sayakpaul reviewed Dec 20, 2022

View reviewed changes

sayakpaul mentioned this pull request Jan 26, 2023

Building the model with (None, None, 3) #11

Closed

Update README.md

ccdbe31

list changes done to achieve arbitrary image shapes.

thekevinscott mentioned this pull request Mar 4, 2023

Issues with patch sizes and MAXIM models thekevinscott/UpscalerJS#913

Closed

LeviVasconcelos added 2 commits July 2, 2023 02:10

remove flag dynamic_resize

805c706

modify model to (None,None)

ff992fc

		x = BlockImages()(x, patch_size=(fh, fw))
		x, ph, pw = TFBlockImagesByGrid()(x, grid_size=(gh, gw))

		@@ -76,28 +100,60 @@ def get_config(self):


		@tf.keras.utils.register_keras_serializable("maxim")
		class Resizing(layers.Layer):

Add support for arbitrary image resolutions #24

Are you sure you want to change the base?

Add support for arbitrary image resolutions #24

Conversation

LeviVasconcelos commented Dec 16, 2022 • edited Loading

Choose a reason for hiding this comment

LeviVasconcelos Dec 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeviVasconcelos Dec 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeviVasconcelos Dec 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sayakpaul left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeviVasconcelos Dec 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeviVasconcelos Dec 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeviVasconcelos Dec 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeviVasconcelos Dec 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeviVasconcelos Dec 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sayakpaul left a comment

Choose a reason for hiding this comment

LeviVasconcelos commented Dec 17, 2022

sayakpaul left a comment

Choose a reason for hiding this comment

sayakpaul commented Dec 20, 2022

sayakpaul commented Dec 29, 2022

LeviVasconcelos commented Jan 3, 2023

sayakpaul commented Jan 3, 2023

LeviVasconcelos commented Jan 10, 2023

danwexler commented Feb 3, 2023

sayakpaul commented Feb 3, 2023

LeviVasconcelos commented Jul 2, 2023 • edited Loading

LeviVasconcelos commented Jul 7, 2023

sayakpaul commented Jul 7, 2023

LeviVasconcelos commented Jul 18, 2023

rogeriofonteles commented Aug 18, 2023

LeviVasconcelos commented Sep 18, 2023

fcmr commented Sep 28, 2023

LeviVasconcelos commented Dec 16, 2022 •

edited

Loading

LeviVasconcelos Dec 16, 2022 •

edited

Loading

LeviVasconcelos Dec 16, 2022 •

edited

Loading

LeviVasconcelos Dec 17, 2022 •

edited

Loading

LeviVasconcelos Dec 16, 2022 •

edited

Loading

LeviVasconcelos Dec 17, 2022 •

edited

Loading

LeviVasconcelos Dec 16, 2022 •

edited

Loading

LeviVasconcelos Dec 16, 2022 •

edited

Loading

LeviVasconcelos Dec 17, 2022 •

edited

Loading

LeviVasconcelos commented Jul 2, 2023 •

edited

Loading