Skip to content

Conversation

@Isalia20
Copy link
Contributor

@Isalia20 Isalia20 commented Nov 6, 2025

Deformable attention implementation. Fixes: pytorch/pytorch#112827

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 6, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9260

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the cla signed label Nov 6, 2025
@drisspg drisspg requested a review from NicolasHug November 11, 2025 02:59
@Isalia20
Copy link
Contributor Author

@NicolasHug Can you please review this PR?

@NicolasHug
Copy link
Member

Hi @Isalia20 , thanks for the PR.

This is a massive one :)

Can you share more about why this is needed? I shared my thought and questions on that before inhttps://github.com/pytorch/pytorch/issues/112827#issuecomment-1804905345. It seems like there are really solid implementations of deformable attention already, so we'd like to understand the added-value of having this natively in torchvision.

@Isalia20
Copy link
Contributor Author

Isalia20 commented Nov 26, 2025

There are 2 main reasons why this would be a nice addition:

  1. The deformable attention kernels are definitely solid, but they are outdated. When installing they give deprecation warnings for the kernels. So it's a matter of time before torch removes some of the deprecated features and these kernels stop working. As I remember when implementing this PR, one of them was due to usage of .data<scalar_t>() for acccessing the data_ptr. In this PR that is replaced with the new data_ptr() so no warnings are given.
    See:
    https://github.com/Isalia20/vision/blob/cc76d33e073f9ecf7a8af695c31708663c6a7fac/torchvision/csrc/ops/cuda/deform_attn_kernel.cu#L1691
    https://github.com/facebookresearch/dinov3/blob/54694f7627fd815f62a5dcc82944ffa6153bbb76/dinov3/eval/segmentation/models/utils/ops/src/cuda/ms_deform_attn_cuda.cu#L142C42-L142C59

  2. Another one is to have it directly in torchvision, so users don't need to additionally build the kernels themselves like it's done in the SOTA vision model like DINOv3:
    https://github.com/facebookresearch/dinov3/tree/main/dinov3/eval/segmentation/models/utils/ops/src/cuda

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multi Scale Deformable Attention Support

2 participants