Fixed same weights bug when loading SSL models from Vissl #178

jonasd4 · 2024-09-06T14:35:27Z

For simclr-rn50, swav-rn50 and pirl-rn50, the model weights ended up being the first one that were loaded because torch.hub.load_state_dict_from_url cached the state dicts based on the file_name of the url (last part) although the first part of the url was completely different. With the file_name argument, we specify now a unique name based on the model name.

codecov · 2024-09-06T14:50:03Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.30%. Comparing base (ad903f3) to head (e2a7892).
Report is 3 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #178      +/-   ##
==========================================
+ Coverage   76.23%   76.30%   +0.06%     
==========================================
  Files          40       40              
  Lines        2058     2055       -3     
  Branches      262      262              
==========================================
- Hits         1569     1568       -1     
+ Misses        404      402       -2     
  Partials       85       85

Flag	Coverage Δ
unittests	`76.30% <100.00%> (+0.06%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

a1247418

LGTM

Alxmrphi · 2024-09-08T00:07:45Z

Hi Jonas, I can imagine this was an incredibly annoying bug to find 🤣

Do I understand it correctly that, for example, if you loaded swav-rn50 then the model URL is https://dl.fbaipublicfiles.com/vissl/model_zoo/swav_in1k_rn50_800ep_swav_8node_resnet_27_07_20.a0a6b676/model_final_checkpoint_phase799.torch and torch then caches this state dict based on the model_final_checkpoint_phase799.torch part. Then if you want to use pirl-rn50 then it takes its URL (https://dl.fbaipublicfiles.com/vissl/model_zoo/pirl_jigsaw_4node_pirl_jigsaw_4node_resnet_22_07_20.34377f59/model_final_checkpoint_phase799.torch") and looks up the filename as of the last path delimiter (/) which is the same for those three models you mentioned and incorrectly ignores anything but the first one?

Okay, I've taken a closer look at all the filenames now and can more clearly see what the issue was and how the fix amends it. It would have been a bit clearer to specify more clearly the addition of unique_model_id in _download_and_save_model which then is used to define the file_name parameter of torch's load_state_dict_from_url and perhaps a demo of the problem (citing the problematic URLs).

I can see this was raised in the main repo about 5 years ago, yikes.

Alxmrphi

LGTM

LukasMut

LGTM

added unique model id to load_state_dict_from_url

c953be8

a1247418 approved these changes Sep 6, 2024

View reviewed changes

removed torch hub load_state_dict_from_url fallback import

e2a7892

LukasMut requested review from a1247418, LukasMut and Alxmrphi September 6, 2024 20:26

Alxmrphi approved these changes Sep 8, 2024

View reviewed changes

LukasMut assigned jonasd4 Sep 9, 2024

LukasMut added bug Something isn't working cleanup Deprecate or refactor code labels Sep 9, 2024

LukasMut approved these changes Sep 9, 2024

View reviewed changes

LukasMut added this pull request to the merge queue Sep 9, 2024

Merged via the queue into master with commit 32b1a96 Sep 9, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed same weights bug when loading SSL models from Vissl #178

Fixed same weights bug when loading SSL models from Vissl #178

jonasd4 commented Sep 6, 2024 •

edited

Loading

codecov bot commented Sep 6, 2024 •

edited

Loading

a1247418 left a comment

Alxmrphi commented Sep 8, 2024 •

edited

Loading

Alxmrphi left a comment

LukasMut left a comment

Fixed same weights bug when loading SSL models from Vissl #178

Fixed same weights bug when loading SSL models from Vissl #178

Conversation

jonasd4 commented Sep 6, 2024 • edited Loading

codecov bot commented Sep 6, 2024 • edited Loading

Codecov Report

a1247418 left a comment

Choose a reason for hiding this comment

Alxmrphi commented Sep 8, 2024 • edited Loading

Alxmrphi left a comment

Choose a reason for hiding this comment

LukasMut left a comment

Choose a reason for hiding this comment

jonasd4 commented Sep 6, 2024 •

edited

Loading

codecov bot commented Sep 6, 2024 •

edited

Loading

Alxmrphi commented Sep 8, 2024 •

edited

Loading