Skip to content

Conversation

@james77777778
Copy link
Collaborator

@james77777778 james77777778 commented Oct 19, 2025

Description of the change

This PR adds ViT version of DINOV3.

The conversion is integrated into TransformersPresetLoader allowing users to convert the model on-the-fly:

The numeric results:

  • JAX + CPU
Variant Implementation Diff.
dinov3_vit_small_lvd1689m 🔶 Modeling difference: 4.145764e-07
🔶 Preprocessing difference: 2.2059962e-07
dinov3_vit_small_plus_lvd1689m 🔶 Modeling difference: 6.1882946e-07
🔶 Preprocessing difference: 2.2059962e-07
dinov3_vit_base_lvd1689m 🔶 Modeling difference: 6.0504067e-07
🔶 Preprocessing difference: 2.2059962e-07
dinov3_vit_large_lvd1689m 🔶 Modeling difference: 4.8083956e-07
🔶 Preprocessing difference: 2.2059962e-07
dinov3_vit_huge_plus_lvd1689m 🔶 Modeling difference: 2.9116026e-07
🔶 Preprocessing difference: 2.2059962e-07
dinov3_vit_7b_lvd1689m TBD
dinov3_vit_large_sat493m 🔶 Modeling difference: 3.3045615e-07
🔶 Preprocessing difference: 2.5115654e-07
dinov3_vit_7b_sat493m TBD

Reference

Related to #2365

Colab Notebook

Similar to DINOV2.

https://colab.research.google.com/drive/19NUTXdbRtwgDPmBTNLo1414weXhKy_VM

Checklist

  • I have added all the necessary unit tests for my change.
  • I have verified that my change does not break existing code and works with all backends (TensorFlow, JAX, and PyTorch).
  • My PR is based on the latest changes of the main branch (if unsure, rebase the code).
  • I have followed the Keras Hub Model contribution guidelines in making these changes.
  • I have followed the Keras Hub API design guidelines in making these changes.
  • I have signed the Contributor License Agreement.

Experience Co-working with Gemini CLI

Recently, I've found that using the Gemini CLI (v0.9.0) to implement successor models in KerasHub works surprisingly well.

This PR is a demo of my cowork with the tool. Here are a few experiences I want to share:

  1. It works really well when a similar model architecture already exists in the codebase.
    • Implementing a model like DINOV3 was a good experience since DINOV2 already existed. I doubt that implementing an entirely new arch with the Gemini CLI would work as well.
  2. Human-in-the-loop is essential.
    • Gemini CLI struggled with fetching config details from HF. For instance, it failed so many times to get the correct preset facebook/dinov3-vits16-pretrain-lvd1689m.
    • It also made syntax errors that a basic linter would have caught, such as def def function() or attr missing self as the prefix.
    • Occasionally, the tool overthinks the problem/issue, leading to heavy modifications and consuming a lot of context for inspecting its own output. Fortunately, we can interrupt the process and guide it back.
  3. Fun and satisfying.
    • I must admit that I haven't bought into the AI agent coding hype but this is a successful case for speeding up impl in KerasHub.

Here is a step-by-step process of how I co-worked with the tool:

  1. I instructed it to implement dinov3_layers.py using https://github.com/huggingface/transformers/blob/main/src/transformers/models/dinov3_vit/modular_dinov3_vit.py as a ref.
    • It did a good job on the initial impl but the correctness was still uncertain at this stage.
  2. I asked it to create a demo_dinov3.py to compare the output shape/numerics against the pretrained weights from HF.
    • Many errors raised. While the tool fixed most of them, some required my interventions.
    • After 5~7 rounds, it achieved an acceptable output with atol<=1e-3. Nice!
    • However, the style of the impl was different from the codebase.
  3. I instructed it to refactor dinov3_layers.py and add dinov3_backbone.py.
  4. Finally, I instructed it to add a conversion script to replace demo_dinov3.py.
    • This step required my interventions to correctly set the variable names in the preset loader.
  5. (2025/10/19) I instructed it to add tests and docstrings.
    • I needed to fix a few errors but the intervention was minimal.

@james77777778 james77777778 changed the title [WIP] Add DINOV3 with the help from Gemini CLI. [WIP] Add DINOV3 with assistance from the Gemini CLI. Oct 19, 2025
@james77777778 james77777778 force-pushed the add-dinov3 branch 3 times, most recently from cd74a32 to 6f662c2 Compare October 20, 2025 05:56
@james77777778 james77777778 added the kokoro:force-run Runs Tests on GPU label Oct 20, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Oct 20, 2025
@james77777778 james77777778 added the kokoro:force-run Runs Tests on GPU label Oct 20, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Oct 20, 2025
@james77777778 james77777778 changed the title [WIP] Add DINOV3 with assistance from the Gemini CLI. Add DINOV3 with assistance from the Gemini CLI. Oct 20, 2025
@james77777778 james77777778 marked this pull request as ready for review October 20, 2025 12:29
@james77777778
Copy link
Collaborator Author

/gemini review

@james77777778 james77777778 added the kokoro:force-run Runs Tests on GPU label Oct 20, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Oct 20, 2025
@divyashreepathihalli
Copy link
Collaborator

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the DINOV3 model, which is a valuable addition. The implementation is largely well-executed, following the existing patterns in the codebase. However, I've identified a few critical issues related to the handling of the training argument in custom layers, which would lead to incorrect behavior during model training. Additionally, several new files are missing their corresponding test files, which is a violation of the repository's contribution guidelines. There are also some minor inconsistencies with the style guide regarding docstrings and naming conventions. Addressing these points will significantly improve the quality and maintainability of this contribution.

@james77777778 james77777778 added the kokoro:force-run Runs Tests on GPU label Oct 27, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Oct 27, 2025
@sachinprasadhs sachinprasadhs added the kokoro:force-run Runs Tests on GPU label Oct 27, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Oct 27, 2025
@sachinprasadhs
Copy link
Collaborator

GPU test is failing with 401 error for preset https://huggingface.co/facebook/dinov3-vits16-pretrain-lvd1689m, check the PyTorch GPU result.

@james77777778
Copy link
Collaborator Author

GPU test is failing with 401 error for preset https://huggingface.co/facebook/dinov3-vits16-pretrain-lvd1689m, check the PyTorch GPU result.

Seems that the DINOV3 model weights require extra permissions to be downloaded. I've skipped that test for now.

@sachinprasadhs sachinprasadhs added the kokoro:force-run Runs Tests on GPU label Oct 28, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Oct 28, 2025
@sachinprasadhs sachinprasadhs merged commit 780919d into keras-team:master Oct 30, 2025
9 of 11 checks passed
@james77777778 james77777778 deleted the add-dinov3 branch October 30, 2025 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants