Skip to content

Conversation

cjluo-nv
Copy link
Collaborator

@cjluo-nv cjluo-nv commented Oct 6, 2025

What does this PR do?

Type of change: ? Bug fix

Overview: ?

We need to explicitly register the gemms.

Summary by CodeRabbit

  • New Features

    • Improved detection and automatic enablement of FP8 and FP4 quantized linear backends when supported by the environment.
  • Bug Fixes

    • More predictable backend selection and import behavior, reducing unexpected side effects from wildcard imports.
  • Refactor

    • Switched to explicit imports and registrations for quantization backends to clarify the public API and improve maintainability.

@cjluo-nv cjluo-nv requested a review from a team as a code owner October 6, 2025 18:23
@cjluo-nv cjluo-nv requested a review from RalphMao October 6, 2025 18:23
Copy link

coderabbitai bot commented Oct 6, 2025

Walkthrough

Replaced wildcard imports in backends initializer with explicit imports, exposed specific classes and availability checks, and performed explicit registrations of GEMM implementations (FP8 per-tensor and NVFP4) into the gemm registry.

Changes

Cohort / File(s) Summary of Changes
Backends init and registration
modelopt/torch/quantization/backends/__init__.py
Switched to explicit imports for Fp8PerTensorLinear, Nvfp4Linear, their availability checks, and gemm_registry; explicitly registered Fp8PerTensorLinear.apply and Nvfp4Linear.apply with corresponding availability checks; exposed these symbols in the module namespace.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I twitch my ears at tidy imports neat,
Two gems now registered, connections complete.
FP8 hops in, NVFP4 too—
The registry burrow knows what to do.
With explicit trails, I thump in delight,
Carrots compiled, all linking right. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title concisely and accurately describes the primary change of explicitly registering quantized GEMM implementations in the backend, matching the core updates in the pull request without extra jargon or noise. It provides enough context for a reviewer scanning history to understand the main action introduced by the changeset.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cjluo-nv-patch-1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
modelopt/torch/quantization/backends/__init__.py (2)

18-20: Consider whether private functions should be exposed in the public API.

The imports include _fp8_availability_check and _nvfp4_availability_check, which are prefixed with underscores indicating they are intended to be private. This exposes them in the module's public API. If these functions are only used internally for registration, consider importing them within a private scope or document why they need to be publicly accessible.


28-32: Consider clarifying the duplicate comment.

The registration logic is correct and addresses the PR objective. However, line 28 has the same comment as line 22 ("Register default implementations"). Consider making this more specific (e.g., "Register NVFP4 implementation") or removing it if it's redundant.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 340eb7a and 7ae078b.

📒 Files selected for processing (1)
  • modelopt/torch/quantization/backends/__init__.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
modelopt/torch/quantization/backends/__init__.py (2)
modelopt/torch/quantization/backends/fp8_per_tensor_gemm.py (4)
  • fp8_per_tensor_gemm (57-96)
  • Fp8PerTensorLinear (137-200)
  • _fp8_availability_check (99-134)
  • apply (197-200)
modelopt/torch/quantization/backends/nvfp4_gemm.py (4)
  • nvfp4_gemm (31-128)
  • Nvfp4Linear (131-193)
  • _nvfp4_availability_check (196-250)
  • apply (190-193)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: wait-checks / wait
  • GitHub Check: build-docs
  • GitHub Check: code-quality
🔇 Additional comments (1)
modelopt/torch/quantization/backends/__init__.py (1)

22-26: LGTM!

The explicit registration of Fp8PerTensorLinear.apply with its availability check correctly addresses the PR objective to explicitly register the GEMMs.

Copy link

codecov bot commented Oct 6, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.55%. Comparing base (340eb7a) to head (7ae078b).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #402      +/-   ##
==========================================
- Coverage   73.79%   73.55%   -0.25%     
==========================================
  Files         171      172       +1     
  Lines       17591    17686      +95     
==========================================
+ Hits        12982    13009      +27     
- Misses       4609     4677      +68     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant