fix scaling of RGBs for tensorflow/keras #183

candemircan · 2024-11-26T13:24:07Z

Hi,

This pull request fixes the following:

Harmonizer (and Generic `tf`) Models

The default tf preprocessing now rescales the RGBs between 0, and 1. As you can see here, now we get identical results from thingsvision and when using the harmonized models directly.

Note that this also applies the same scaling to any tf model that is manually turned into an extractor, and any tf model that does not specify a preprocessing function.

`tf.keras` pretrained model fixes

Before these commits, always the same preprocessing was applied to the pretrained tf.keras.applications models. This yielded silent but catastrophic errors, as unfortunately, the pretrained models differ in their preprocessing in non-systematic ways. This is now fixed. Again, using models via thingsvision and directly yielded the same results here.

Getting the correct preprocessing function is a bit tricky, as it is done via different modules and functions and not through arguments (e.g., tf.keras.applications.efficientnet_v2.preprocess_input). Additionally, the preprocessing module names differ from the model names in subtle ways. I tried to resolve this using regular expressions, and it seems to work for the current models. If there is no regexp match, currently we raise a warning saying no preprocessing function could be found and that the user should be careful, and we revert to default preprocessing described in the previous section. If new pretrained models get added, its name needs to be added to the function. This is not ideal, but I couldn't find a better, more generic way.

Additionally, different keras models have different input sizes, and this needs to be accounted for. Now it should be sorted.

testing

I did all the testing in a separate repo here, where I'd also reproduced some of these issues. I know you have a tests folder here. I couldn't quite wrap my head around how to do some of the simple tests I did there. For example, I see that you have a create_test_images function that you call, but I don't see where these images actually get used in the tests. Instead, I think some manually created values are being passed through the models? Either way, it should be straightforward to add the tests I did into your test suite for someone who understands the structure.

Sorry for closing and reopening an issue. Had to create a new branch and delete the old one.

fel-thomas · 2024-11-26T16:32:18Z

Hi all,
this is perfect—thank you @candemircan for handling this and @LukasMut for the management of the issue!
The reproduction is really convincing.

Wishing you both a wonderful week!

fel-thomas

LGTM

LukasMut · 2024-11-26T19:51:13Z

Hi all,

this is perfect—thank you @candemircan for handling this and @LukasMut for the management of the issue!

The reproduction is really convincing.

Wishing you both a wonderful week!

Thanks for taking a look @fel-thomas! 🙏🏼 You too have a great rest of the week :)

LukasMut · 2024-11-26T19:51:47Z

Hi,

This pull request fixes the following:

Harmonizer (and Generic tf) Models

The default tf preprocessing now rescales the RGBs between 0, and 1. As you can see here, now we get identical results from thingsvision and when using the harmonized models directly.

Note that this also applies the same scaling to any tf model that is manually turned into an extractor, and any tf model that does not specify a preprocessing function.

tf.keras pretrained model fixes

Before these commits, always the same preprocessing was applied to the pretrained tf.keras.applications models. This yielded silent but catastrophic errors, as unfortunately, the pretrained models differ in their preprocessing in non-systematic ways. This is now fixed. Again, using models via thingsvision and directly yielded the same results here.

Getting the correct preprocessing function is a bit tricky, as it is done via different modules and functions and not through arguments (e.g., tf.keras.applications.efficientnet_v2.preprocess_input). Additionally, the preprocessing module names differ from the model names in subtle ways. I tried to resolve this using regular expressions, and it seems to work for the current models. If there is no regexp match, currently we raise a warning saying no preprocessing function could be found and that the user should be careful, and we revert to default preprocessing described in the previous section. If new pretrained models get added, its name needs to be added to the function. This is not ideal, but I couldn't find a better, more generic way.

Additionally, different keras models have different input sizes, and this needs to be accounted for. Now it should be sorted.

testing

I did all the testing in a separate repo here, where I'd also reproduced some of these issues. I know you have a tests folder here. I couldn't quite wrap my head around how to do some of the simple tests I did there. For example, I see that you have a create_test_images function that you call, but I don't see where these images actually get used in the tests. Instead, I think some manually created values are being passed through the models? Either way, it should be straightforward to add the tests I did into your test suite for someone who understands the structure.

Sorry for closing and reopening an issue. Had to create a new branch and delete the old one.

Thanks a lot for handling this @candemircan! 🙏🏼 Much appreciated!

LukasMut · 2024-11-28T11:51:12Z

@martinhebart could you make sure that this PR gets into a state where I can confidently merge it into main? It would be very helpful if one of you @jonasd4 @Alxmrphi @andropar could review this PR. I'll then do the final review and accept the changes.

andropar

I'm no expert with Keras, but looks good to me! The test does speak for itself too, good job 👍

Cleaned up a bit

LukasMut

LGTM. I cleaned up a bit and have one minor comment. Overall this looks good. Great work!

LukasMut · 2024-12-04T09:58:44Z

thingsvision/core/extraction/extractors.py

+        ]
+        # Try each pattern
+        for pattern, preprocess_value in patterns:
+            if re.match(pattern, model_name):


Why re.match(pattern, string) and not re.search(pattern, string)? re.match searches for the pattern only at the beginning of a string, whereas re.search searches through the entire string. I am seeing that you specify start and end positions using ^ and $ although you use re.match. Why is that? I think we could just drop all ^ and $ and it would work equally well.

sure, if you're confident that it'll be identical in behaviour then go ahead

Wouldn't omitting $ cause unwanted matches (e.g. "MobileNet" would match "MobileNetV3" too)?

@candemircan This was just feedback :) I am trying to understand whether there is a specific reason for re.match in combination with the regular expressions you used :)

@andropar I guess it depends on re.match vs re.search. re.match searches for the pattern only at the beginning of a string. Thus, ^ is baked into the function and $ is omitted. I don't think we ever need ^ (definitely not for re.match) but $ is needed for one of the two (or maybe even for both the more I think about it? 🤔)

no specific reason on my side. it was the first thing I wrote that worked correctly, and I left it as is because I don't know regexp very well. If there is a better way to do this, go ahead and change it please :)

Ok, thanks for the clarification! I changed it from re.match to re.search because ^ is redundant with re.match. It's baked into the function. I left the patterns as you specified them to be as explicit as possible ($ is indeed necessary; I just tested it). Thank you for helping with this! I will accept the PR now and merge it into main.

coolio, thanks Lukas! all the best

fixed typo

changed `match` to `search`

candemircan added 2 commits November 26, 2024 09:15

fix keras.applications preprpocessing

d606b8f

fix generic tensorflow preprocessing

1f95a73

candemircan mentioned this pull request Nov 26, 2024

fix scaling of RGBs for tensorflow #180

Closed

LukasMut assigned martinhebart and candemircan Nov 26, 2024

LukasMut added the bug Something isn't working label Nov 26, 2024

LukasMut requested review from jonasd4, andropar, Alxmrphi, fel-thomas and LukasMut November 26, 2024 13:49

LukasMut linked an issue Nov 26, 2024 that may be closed by this pull request

Do not use this repo's harmonized models until they are fixed #181

Closed

LukasMut requested a review from hahahannes November 26, 2024 13:51

fel-thomas approved these changes Nov 26, 2024

View reviewed changes

andropar approved these changes Dec 2, 2024

View reviewed changes

Update extractors.py

c8b5ea6

Cleaned up a bit

LukasMut reviewed Dec 4, 2024

View reviewed changes

LukasMut added 4 commits December 4, 2024 11:12

Update extractors.py

6740e2b

fixed typo

Update extractors.py

55341b8

Update extractors.py

abe5445

changed `match` to `search`

Update extractors.py

4aa351e

LukasMut approved these changes Dec 4, 2024

View reviewed changes

LukasMut added this pull request to the merge queue Dec 4, 2024

Merged via the queue into ViCCo-Group:master with commit ee8b576 Dec 4, 2024
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix scaling of RGBs for tensorflow/keras #183

fix scaling of RGBs for tensorflow/keras #183

candemircan commented Nov 26, 2024

fel-thomas commented Nov 26, 2024

fel-thomas left a comment

LukasMut commented Nov 26, 2024

LukasMut commented Nov 26, 2024

Harmonizer (and Generic `tf`) Models

`tf.keras` pretrained model fixes

testing

LukasMut commented Nov 28, 2024

andropar left a comment •

edited

Loading

LukasMut left a comment •

edited

Loading

LukasMut Dec 4, 2024 •

edited

Loading

candemircan Dec 4, 2024

andropar Dec 4, 2024

LukasMut Dec 4, 2024 •

edited

Loading

candemircan Dec 4, 2024

LukasMut Dec 4, 2024 •

edited

Loading

candemircan Dec 4, 2024

fix scaling of RGBs for tensorflow/keras #183

fix scaling of RGBs for tensorflow/keras #183

Conversation

candemircan commented Nov 26, 2024

Harmonizer (and Generic tf) Models

tf.keras pretrained model fixes

testing

fel-thomas commented Nov 26, 2024

fel-thomas left a comment

Choose a reason for hiding this comment

LukasMut commented Nov 26, 2024

LukasMut commented Nov 26, 2024

Harmonizer (and Generic tf) Models

tf.keras pretrained model fixes

testing

LukasMut commented Nov 28, 2024

andropar left a comment • edited Loading

Choose a reason for hiding this comment

LukasMut left a comment • edited Loading

Choose a reason for hiding this comment

LukasMut Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

candemircan Dec 4, 2024

Choose a reason for hiding this comment

andropar Dec 4, 2024

Choose a reason for hiding this comment

LukasMut Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

candemircan Dec 4, 2024

Choose a reason for hiding this comment

LukasMut Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

candemircan Dec 4, 2024

Choose a reason for hiding this comment

Harmonizer (and Generic `tf`) Models

`tf.keras` pretrained model fixes

Harmonizer (and Generic `tf`) Models

`tf.keras` pretrained model fixes

andropar left a comment •

edited

Loading

LukasMut left a comment •

edited

Loading

LukasMut Dec 4, 2024 •

edited

Loading

LukasMut Dec 4, 2024 •

edited

Loading

LukasMut Dec 4, 2024 •

edited

Loading