Explicitly register real quant gemms #402

cjluo-nv · 2025-10-06T18:23:03Z

What does this PR do?

Type of change: ? Bug fix

Overview: ?

We need to explicitly register the gemms.

Summary by CodeRabbit

New Features
- Improved detection and automatic enablement of FP8 and FP4 quantized linear backends when supported by the environment.
Bug Fixes
- More predictable backend selection and import behavior, reducing unexpected side effects from wildcard imports.
Refactor
- Switched to explicit imports and registrations for quantization backends to clarify the public API and improve maintainability.

Signed-off-by: Chenjie Luo <[email protected]>

coderabbitai · 2025-10-06T18:23:14Z

Walkthrough

Replaced wildcard imports in backends initializer with explicit imports, exposed specific classes and availability checks, and performed explicit registrations of GEMM implementations (FP8 per-tensor and NVFP4) into the gemm registry.

Changes

Cohort / File(s)	Summary of Changes
Backends init and registration `modelopt/torch/quantization/backends/__init__.py`	Switched to explicit imports for Fp8PerTensorLinear, Nvfp4Linear, their availability checks, and gemm_registry; explicitly registered Fp8PerTensorLinear.apply and Nvfp4Linear.apply with corresponding availability checks; exposed these symbols in the module namespace.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I twitch my ears at tidy imports neat,
Two gems now registered, connections complete.
FP8 hops in, NVFP4 too—
The registry burrow knows what to do.
With explicit trails, I thump in delight,
Carrots compiled, all linking right. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title concisely and accurately describes the primary change of explicitly registering quantized GEMM implementations in the backend, matching the core updates in the pull request without extra jargon or noise. It provides enough context for a reviewer scanning history to understand the main action introduced by the changeset.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch cjluo-nv-patch-1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

modelopt/torch/quantization/backends/__init__.py (2)

18-20: Consider whether private functions should be exposed in the public API.

The imports include _fp8_availability_check and _nvfp4_availability_check, which are prefixed with underscores indicating they are intended to be private. This exposes them in the module's public API. If these functions are only used internally for registration, consider importing them within a private scope or document why they need to be publicly accessible.

28-32: Consider clarifying the duplicate comment.

The registration logic is correct and addresses the PR objective. However, line 28 has the same comment as line 22 ("Register default implementations"). Consider making this more specific (e.g., "Register NVFP4 implementation") or removing it if it's redundant.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 340eb7a and 7ae078b.

📒 Files selected for processing (1)

modelopt/torch/quantization/backends/__init__.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

modelopt/torch/quantization/backends/__init__.py (2)

modelopt/torch/quantization/backends/fp8_per_tensor_gemm.py (4)

fp8_per_tensor_gemm (57-96)

Fp8PerTensorLinear (137-200)

_fp8_availability_check (99-134)

apply (197-200)

modelopt/torch/quantization/backends/nvfp4_gemm.py (4)

nvfp4_gemm (31-128)

Nvfp4Linear (131-193)

_nvfp4_availability_check (196-250)

apply (190-193)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: wait-checks / wait
GitHub Check: build-docs
GitHub Check: code-quality

🔇 Additional comments (1)

modelopt/torch/quantization/backends/__init__.py (1)

22-26: LGTM!

The explicit registration of Fp8PerTensorLinear.apply with its availability check correctly addresses the PR objective to explicitly register the GEMMs.

codecov · 2025-10-06T18:36:07Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.55%. Comparing base (340eb7a) to head (7ae078b).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #402      +/-   ##
==========================================
- Coverage   73.79%   73.55%   -0.25%     
==========================================
  Files         171      172       +1     
  Lines       17591    17686      +95     
==========================================
+ Hits        12982    13009      +27     
- Misses       4609     4677      +68

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Explicitly register real quant gemms

7ae078b

Signed-off-by: Chenjie Luo <[email protected]>

cjluo-nv requested a review from a team as a code owner October 6, 2025 18:23

cjluo-nv requested a review from RalphMao October 6, 2025 18:23

coderabbitai bot reviewed Oct 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Explicitly register real quant gemms #402

Explicitly register real quant gemms #402

Uh oh!

cjluo-nv commented Oct 6, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 6, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

codecov bot commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Explicitly register real quant gemms #402

Are you sure you want to change the base?

Explicitly register real quant gemms #402

Uh oh!

Conversation

cjluo-nv commented Oct 6, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Oct 6, 2025

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cjluo-nv commented Oct 6, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 6, 2025 •

edited

Loading