Fix ONNX FP8 scaling #446

Darth-Kronos · 2025-10-17T14:16:38Z

What does this PR do?

Type of change: ? Bug Fix

Overview: ?
When converting weight tensors to INT8/FP8, the zero-point array’s datatype was previously validated against ONNX datatypes (onnx.TensorProto.FLOAT8E4M3FN or onnx.TensorProto.INT8). However, since the zero-point array is a NumPy array, weights were always incorrectly scaled to INT8 for FP8 quantization.

This PR fixes that issue by checking the data_type field from the onnx.TensorProto instead of inferring it from the corresponding NumPy arrays.

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: No
Did you add or update any necessary documentation?: No
Did you update Changelog?: No

Additional Information

Summary by CodeRabbit

Refactor
- Enhanced quantization parameter handling in ONNX utilities to improve type safety and consistency across quantization workflows.
- Strengthened FP8 tensor creation with explicit data type specification.
- Improved robustness of quantized model processing through refined internal data structure handling.

Signed-off-by: Purushothaman Saravanan <[email protected]>

copy-pr-bot · 2025-10-17T14:16:42Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-10-17T14:17:01Z

Walkthrough

The PR refactors type handling for scale and zero-point values in quantization utilities, shifting from numpy arrays to ONNX TensorProto objects at the retrieval stage, with type conversions deferred to points of use.

Changes

Cohort / File(s)	Summary
ONNX Tensor Type Refactoring `modelopt/onnx/quantization/qdq_utils.py`	`_get_scale_and_zp` now returns ONNX TensorProto objects instead of numpy arrays; `_convert_weight` updated to accept ONNX TensorProto for scale and zero-point with internal conversion to arrays; `_create_fp8_tensor` explicitly sets data_type to Float8; call sites in `qdq_to_dq` and related functions updated to handle TensorProto objects; error checks adjusted to use tensor metadata (e.g., `zp.data_type`) instead of numpy dtype operations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Rationale: Single file with consistent, homogeneous type refactoring pattern (numpy arrays → ONNX TensorProto). Changes follow a straightforward substitution logic: parameter types updated, conversions relocated to call sites, and tensor metadata accessed appropriately. Control flow unchanged. Review focuses on verifying type correctness and conversion safety.

Poem

🐰 Types transformed from arrays to tensors so bright,
Scale and zero-point dressed in ONNX delight,
Conversions deferred to their rightful place,
FP8 floats now wear their data_type face!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The PR title "Fix ONNX FP8 scaling" directly addresses the main objective of the changeset, which is to correct incorrect scaling of weight tensors when converting to INT8/FP8 by fixing datatype checks in ONNX tensor handling. The title is concise, avoids vague or generic terminology, and clearly conveys that this is a bug fix focused on FP8 scaling in ONNX code. A developer scanning the repository history would reasonably understand that this PR addresses a scaling-related issue in the ONNX FP8 quantization logic. While the title could be slightly more specific about the root cause (improper datatype checking), it nonetheless captures the essential nature of the fix.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Darth-Kronos and others added 2 commits October 16, 2025 14:53

check data type using onnx tensors

c850347

Signed-off-by: Purushothaman Saravanan <[email protected]>

Merge branch 'NVIDIA:main' into fix_onnx_fp8_scaling

20638aa

Darth-Kronos requested a review from a team as a code owner October 17, 2025 14:16

Darth-Kronos requested a review from i-riyad October 17, 2025 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix ONNX FP8 scaling #446

Fix ONNX FP8 scaling #446

Darth-Kronos commented Oct 17, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Oct 17, 2025

Uh oh!

coderabbitai bot commented Oct 17, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix ONNX FP8 scaling #446

Are you sure you want to change the base?

Fix ONNX FP8 scaling #446

Conversation

Darth-Kronos commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Oct 17, 2025

Uh oh!

coderabbitai bot commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Darth-Kronos commented Oct 17, 2025 •

edited

Loading

coderabbitai bot commented Oct 17, 2025 •

edited

Loading