Fix fp16 ONNX export for RT-DETR and RT-DETRv2 #36460

qubvel · 2025-02-27T18:07:35Z

What does this PR do?

Fix fp16 ONNX export for RT-DETR and RT-DETRv2, related to

Add ONNX config for RT-DETR (and RT-DETRv2) optimum#2201

qubvel · 2025-02-27T18:08:34Z

src/transformers/models/rt_detr/modeling_rt_detr.py

+        grid_w = torch.arange(int(width), device=device).to(dtype)
+        grid_h = torch.arange(int(height), device=device).to(dtype)


The Range op in fp16 is not supported by ONNX, so we use it with the default type and then cast it to the desired data type.

qubvel · 2025-02-27T18:10:07Z

src/transformers/models/rt_detr/modeling_rt_detr.py

+                torch.arange(end=height, device=device).to(dtype),
+                torch.arange(end=width, device=device).to(dtype),


Same about Range

qubvel · 2025-02-27T18:23:38Z

run-slow: rt_detr, rt_detr_v2

github-actions · 2025-02-27T18:24:49Z

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/rt_detr', 'models/rt_detr_v2']
quantizations: [] ...

HuggingFaceDocBuilderDev · 2025-02-27T18:42:28Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qubvel · 2025-02-27T19:07:25Z

src/transformers/models/rt_detr/modeling_rt_detr.py

+        for _ in range(self.num_fpn_stages):
+            lateral_conv = RTDetrConvNormLayer(
+                config,
+                in_channels=self.encoder_hidden_dim,
+                out_channels=self.encoder_hidden_dim,
+                kernel_size=1,
+                stride=1,
+                activation=activation,
            )
-            self.fpn_blocks.append(RTDetrCSPRepLayer(config))
+            fpn_block = RTDetrCSPRepLayer(config)
+            self.lateral_convs.append(lateral_conv)
+            self.fpn_blocks.append(fpn_block)

-        # bottom-up pan
+        # bottom-up PAN
        self.downsample_convs = nn.ModuleList()
        self.pan_blocks = nn.ModuleList()
-        for _ in range(len(self.in_channels) - 1):
-            self.downsample_convs.append(
-                RTDetrConvNormLayer(
-                    config, self.encoder_hidden_dim, self.encoder_hidden_dim, 3, 2, activation=activation_function
-                )
+        for _ in range(self.num_pan_stages):
+            downsample_conv = RTDetrConvNormLayer(
+                config,
+                in_channels=self.encoder_hidden_dim,
+                out_channels=self.encoder_hidden_dim,
+                kernel_size=3,
+                stride=2,
+                activation=activation,
            )
-            self.pan_blocks.append(RTDetrCSPRepLayer(config))
+            pan_block = RTDetrCSPRepLayer(config)
+            self.downsample_convs.append(downsample_conv)
+            self.pan_blocks.append(pan_block)


Just refactoring

qubvel · 2025-02-27T19:08:10Z

src/transformers/models/rt_detr/modeling_rt_detr.py

+        for idx, (lateral_conv, fpn_block) in enumerate(zip(self.lateral_convs, self.fpn_blocks)):
+            backbone_feature_map = hidden_states[self.num_fpn_stages - idx - 1]
+            top_fpn_feature_map = fpn_feature_maps[-1]
+            # apply lateral block
+            top_fpn_feature_map = lateral_conv(top_fpn_feature_map)
+            fpn_feature_maps[-1] = top_fpn_feature_map
+            # apply fpn block
+            top_fpn_feature_map = F.interpolate(top_fpn_feature_map, scale_factor=2.0, mode="nearest")
+            fused_feature_map = torch.concat([top_fpn_feature_map, backbone_feature_map], dim=1)
+            new_fpn_feature_map = fpn_block(fused_feature_map)
+            fpn_feature_maps.append(new_fpn_feature_map)
+
+        fpn_feature_maps = fpn_feature_maps[::-1]
+
+        # bottom-up PAN
+        pan_feature_maps = [fpn_feature_maps[0]]
+        for idx, (downsample_conv, pan_block) in enumerate(zip(self.downsample_convs, self.pan_blocks)):
+            top_pan_feature_map = pan_feature_maps[-1]
+            fpn_feature_map = fpn_feature_maps[idx + 1]
+            downsampled_feature_map = downsample_conv(top_pan_feature_map)
+            fused_feature_map = torch.concat([downsampled_feature_map, fpn_feature_map], dim=1)
+            new_pan_feature_map = pan_block(fused_feature_map)
+            pan_feature_maps.append(new_pan_feature_map)

        if not return_dict:
-            return tuple(v for v in [fpn_states, encoder_states, all_attentions] if v is not None)
-        return BaseModelOutput(last_hidden_state=fpn_states, hidden_states=encoder_states, attentions=all_attentions)
+            return tuple(v for v in [pan_feature_maps, encoder_states, all_attentions] if v is not None)
+        return BaseModelOutput(
+            last_hidden_state=pan_feature_maps, hidden_states=encoder_states, attentions=all_attentions
+        )


Just refactoring, no changes

qubvel · 2025-02-27T19:10:51Z

run-slow: rt_detr, rt_detr_v2

github-actions · 2025-02-27T19:12:07Z

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/rt_detr', 'models/rt_detr_v2']
quantizations: [] ...

qubvel · 2025-02-27T20:43:10Z

run-slow: rt_detr, rt_detr_v2

github-actions · 2025-02-27T20:44:22Z

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/rt_detr', 'models/rt_detr_v2']
quantizations: [] ...

qubvel · 2025-02-28T11:06:29Z

cc @xenova if you have bandwidth

xenova

Thanks! Just one comment about using the trace-safe torch_int helper function for typecasts.

I also left additional comments here to ensure trace-compatibility: huggingface/optimum#2201 (review)

Edit: Just tested the exports and indeed, there are some issues when running with shapes different than export size (e.g., export=320x320, runtime=480x320). Addressing huggingface/optimum#2201 (review) should fix this.

RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Reshape node. Name:'/model/encoder/Reshape' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/tensor/reshape_helper.h:39 onnxruntime::ReshapeHelper::ReshapeHelper(const onnxruntime::TensorShape&, onnxruntime::TensorShapeVector&, bool) size != 0 && (input_shape_size % size) == 0 was false. The input tensor cannot be reshaped to the requested shape. Input shape:{1,256,15,10}, requested shape:{-1,256,100}

xenova · 2025-02-28T11:28:31Z

src/transformers/models/rt_detr_v2/modeling_rt_detr_v2.py

+        grid_w = torch.arange(int(width), device=device).to(dtype)
+        grid_h = torch.arange(int(height), device=device).to(dtype)


Using python type casts (int(...) or float(...)) causes the tracer to lose information, so could you instead use the torch_int utility function (see here). It only has a difference when tracing.

Warning logs:

/usr/local/lib/python3.11/dist-packages/transformers/models/rt_detr_v2/modeling_rt_detr_v2.py:989: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! grid_w = torch.arange(int(width), device=device).to(dtype) /usr/local/lib/python3.11/dist-packages/transformers/models/rt_detr_v2/modeling_rt_detr_v2.py:990: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! grid_h = torch.arange(int(height), device=device).to(dtype)

xenova · 2025-02-28T11:28:57Z

src/transformers/models/rt_detr/modeling_rt_detr.py

+        grid_w = torch.arange(int(width), device=device).to(dtype)
+        grid_h = torch.arange(int(height), device=device).to(dtype)


same as other comment

qubvel added 2 commits February 27, 2025 18:00

Fix FP16 ONNX export

00e2ead

Fix typo

0a3fc6e

qubvel commented Feb 27, 2025

View reviewed changes

Sync omdet-turbo

cbbb776

qubvel mentioned this pull request Feb 27, 2025

Add ONNX config for RT-DETR (and RT-DETRv2) huggingface/optimum#2201

Open

Refactor encoder for better readability

7e85bd1

qubvel commented Feb 27, 2025

View reviewed changes

qubvel marked this pull request as ready for review February 27, 2025 19:10

Fix _no_split_modules

12bd68e

xenova approved these changes Feb 28, 2025

View reviewed changes

qubvel added 5 commits February 28, 2025 12:18

Fix int -> torch_int

f1d7441

Fix rt_detr

4aefd1a

Apply to rt-detr-v2

11ce236

Fixup

1b1a901

Fix copies

1b5a7b9

shethaadit approved these changes Feb 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix fp16 ONNX export for RT-DETR and RT-DETRv2 #36460

Fix fp16 ONNX export for RT-DETR and RT-DETRv2 #36460

qubvel commented Feb 27, 2025 •

edited

Loading

qubvel Feb 27, 2025

qubvel Feb 27, 2025

qubvel commented Feb 27, 2025

github-actions bot commented Feb 27, 2025

HuggingFaceDocBuilderDev commented Feb 27, 2025

qubvel Feb 27, 2025

qubvel Feb 27, 2025

qubvel commented Feb 27, 2025

github-actions bot commented Feb 27, 2025

qubvel commented Feb 27, 2025

github-actions bot commented Feb 27, 2025

qubvel commented Feb 28, 2025

xenova left a comment •

edited

Loading

xenova Feb 28, 2025

xenova Feb 28, 2025

		grid_w = torch.arange(int(width), device=device).to(dtype)
		grid_h = torch.arange(int(height), device=device).to(dtype)

		torch.arange(end=height, device=device).to(dtype),
		torch.arange(end=width, device=device).to(dtype),

Fix fp16 ONNX export for RT-DETR and RT-DETRv2 #36460

Are you sure you want to change the base?

Fix fp16 ONNX export for RT-DETR and RT-DETRv2 #36460

Conversation

qubvel commented Feb 27, 2025 • edited Loading

What does this PR do?

qubvel Feb 27, 2025

Choose a reason for hiding this comment

qubvel Feb 27, 2025

Choose a reason for hiding this comment

qubvel commented Feb 27, 2025

github-actions bot commented Feb 27, 2025

HuggingFaceDocBuilderDev commented Feb 27, 2025

qubvel Feb 27, 2025

Choose a reason for hiding this comment

qubvel Feb 27, 2025

Choose a reason for hiding this comment

qubvel commented Feb 27, 2025

github-actions bot commented Feb 27, 2025

qubvel commented Feb 27, 2025

github-actions bot commented Feb 27, 2025

qubvel commented Feb 28, 2025

xenova left a comment • edited Loading

Choose a reason for hiding this comment

xenova Feb 28, 2025

Choose a reason for hiding this comment

xenova Feb 28, 2025

Choose a reason for hiding this comment

qubvel commented Feb 27, 2025 •

edited

Loading

xenova left a comment •

edited

Loading