- 
                Notifications
    You must be signed in to change notification settings 
- Fork 307
Add DINOV3 with assistance from the Gemini CLI. #2444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cd74a32    to
    6f662c2      
    Compare
  
    6f662c2    to
    15e70fa      
    Compare
  
    15e70fa    to
    9550d79      
    Compare
  
    | /gemini review | 
| /gemini review | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the DINOV3 model, which is a valuable addition. The implementation is largely well-executed, following the existing patterns in the codebase. However, I've identified a few critical issues related to the handling of the training argument in custom layers, which would lead to incorrect behavior during model training. Additionally, several new files are missing their corresponding test files, which is a violation of the repository's contribution guidelines. There are also some minor inconsistencies with the style guide regarding docstrings and naming conventions. Addressing these points will significantly improve the quality and maintainability of this contribution.
c9feee6    to
    0fde3f7      
    Compare
  
    | GPU test is failing with 401 error for preset https://huggingface.co/facebook/dinov3-vits16-pretrain-lvd1689m, check the PyTorch GPU result. | 
0fde3f7    to
    3ad0e9f      
    Compare
  
    | 
 Seems that the DINOV3 model weights require extra permissions to be downloaded. I've skipped that test for now. | 
Description of the change
This PR adds ViT version of
DINOV3.The conversion is integrated into
TransformersPresetLoaderallowing users to convert the model on-the-fly:The numeric results:
dinov3_vit_small_lvd1689m🔶 Preprocessing difference: 2.2059962e-07
dinov3_vit_small_plus_lvd1689m🔶 Preprocessing difference: 2.2059962e-07
dinov3_vit_base_lvd1689m🔶 Preprocessing difference: 2.2059962e-07
dinov3_vit_large_lvd1689m🔶 Preprocessing difference: 2.2059962e-07
dinov3_vit_huge_plus_lvd1689m🔶 Preprocessing difference: 2.2059962e-07
dinov3_vit_7b_lvd1689mdinov3_vit_large_sat493m🔶 Preprocessing difference: 2.5115654e-07
dinov3_vit_7b_sat493mReference
Related to #2365
Colab Notebook
Similar to DINOV2.
https://colab.research.google.com/drive/19NUTXdbRtwgDPmBTNLo1414weXhKy_VM
Checklist
Experience Co-working with Gemini CLI
Recently, I've found that using the Gemini CLI (v0.9.0) to implement successor models in KerasHub works surprisingly well.
This PR is a demo of my cowork with the tool. Here are a few experiences I want to share:
DINOV3was a good experience sinceDINOV2already existed. I doubt that implementing an entirely new arch with the Gemini CLI would work as well.facebook/dinov3-vits16-pretrain-lvd1689m.def def function()or attr missingselfas the prefix.Here is a step-by-step process of how I co-worked with the tool:
dinov3_layers.pyusinghttps://github.com/huggingface/transformers/blob/main/src/transformers/models/dinov3_vit/modular_dinov3_vit.pyas a ref.demo_dinov3.pyto compare the output shape/numerics against the pretrained weights from HF.atol<=1e-3. Nice!dinov3_layers.pyand adddinov3_backbone.py.demo_dinov3.py.