[5693592][ONNX-customOp][Autocast] Fix QuantizeLinear node output type (#671)

gcunhase · web-flow · commit f265f8dba81e · 2025-12-10T18:07:23.000-05:00
## What does this PR do? **Type of change:** Bug fix **Overview:** The output type of Q nodes was being set incorrectly. This PR fixes that. ## Usage ```python $ python -m modelopt.onnx.autocast --onnx_path=$MODEL_NAME.onnx ``` ## Testing See bug 5693592 for more details: ```python $ python -m modelopt.onnx.autocast --onnx_path=$MODEL_NAME.onnx --low_precision_type=fp16 --data_max=inf --init_max=inf --keep_io_types ``` ## Before your PR is "*Ready for review*"  - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: No - **Did you add or update any necessary documentation?**: No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: No Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
diff --git a/modelopt/onnx/autocast/precisionconverter.py b/modelopt/onnx/autocast/precisionconverter.py
@@ -296,6 +296,8 @@ def _get_np_type(node, inp, opset=onnx.defs.onnx_opset_version()):
                 return helper.tensor_dtype_to_np_dtype(node.attrs["to"])
             elif node.op == "DequantizeLinear":
                 return node.inputs[1].dtype  # scale type
+            elif node.op == "QuantizeLinear":
+                return node.inputs[2].dtype  # zero_point type
             elif not inp.dtype or inp.dtype == onnx.TensorProto.UNDEFINED:
                 return None
             elif node.op not in self.custom_ops: