Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed same weights bug when loading SSL models from Vissl #178

Merged
merged 2 commits into from
Sep 9, 2024

Conversation

jonasd4
Copy link
Collaborator

@jonasd4 jonasd4 commented Sep 6, 2024

For simclr-rn50, swav-rn50 and pirl-rn50, the model weights ended up being the first one that were loaded because torch.hub.load_state_dict_from_url cached the state dicts based on the file_name of the url (last part) although the first part of the url was completely different. With the file_name argument, we specify now a unique name based on the model name.

Copy link

codecov bot commented Sep 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.30%. Comparing base (ad903f3) to head (e2a7892).
Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #178      +/-   ##
==========================================
+ Coverage   76.23%   76.30%   +0.06%     
==========================================
  Files          40       40              
  Lines        2058     2055       -3     
  Branches      262      262              
==========================================
- Hits         1569     1568       -1     
+ Misses        404      402       -2     
  Partials       85       85              
Flag Coverage Δ
unittests 76.30% <100.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@a1247418 a1247418 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Alxmrphi
Copy link
Collaborator

Alxmrphi commented Sep 8, 2024

Hi Jonas, I can imagine this was an incredibly annoying bug to find 🤣

Do I understand it correctly that, for example, if you loaded swav-rn50 then the model URL is https://dl.fbaipublicfiles.com/vissl/model_zoo/swav_in1k_rn50_800ep_swav_8node_resnet_27_07_20.a0a6b676/model_final_checkpoint_phase799.torch and torch then caches this state dict based on the model_final_checkpoint_phase799.torch part. Then if you want to use pirl-rn50 then it takes its URL (https://dl.fbaipublicfiles.com/vissl/model_zoo/pirl_jigsaw_4node_pirl_jigsaw_4node_resnet_22_07_20.34377f59/model_final_checkpoint_phase799.torch") and looks up the filename as of the last path delimiter (/) which is the same for those three models you mentioned and incorrectly ignores anything but the first one?

Okay, I've taken a closer look at all the filenames now and can more clearly see what the issue was and how the fix amends it. It would have been a bit clearer to specify more clearly the addition of unique_model_id in _download_and_save_model which then is used to define the file_name parameter of torch's load_state_dict_from_url and perhaps a demo of the problem (citing the problematic URLs).

I can see this was raised in the main repo about 5 years ago, yikes.

Copy link
Collaborator

@Alxmrphi Alxmrphi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LukasMut LukasMut added bug Something isn't working cleanup Deprecate or refactor code labels Sep 9, 2024
Copy link
Collaborator

@LukasMut LukasMut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LukasMut LukasMut added this pull request to the merge queue Sep 9, 2024
Merged via the queue into master with commit 32b1a96 Sep 9, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cleanup Deprecate or refactor code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants