-
Notifications
You must be signed in to change notification settings - Fork 178
Explicitly register real quant gemms #402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Chenjie Luo <[email protected]>
WalkthroughReplaced wildcard imports in backends initializer with explicit imports, exposed specific classes and availability checks, and performed explicit registrations of GEMM implementations (FP8 per-tensor and NVFP4) into the gemm registry. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
modelopt/torch/quantization/backends/__init__.py (2)
18-20
: Consider whether private functions should be exposed in the public API.The imports include
_fp8_availability_check
and_nvfp4_availability_check
, which are prefixed with underscores indicating they are intended to be private. This exposes them in the module's public API. If these functions are only used internally for registration, consider importing them within a private scope or document why they need to be publicly accessible.
28-32
: Consider clarifying the duplicate comment.The registration logic is correct and addresses the PR objective. However, line 28 has the same comment as line 22 ("Register default implementations"). Consider making this more specific (e.g., "Register NVFP4 implementation") or removing it if it's redundant.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
modelopt/torch/quantization/backends/__init__.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
modelopt/torch/quantization/backends/__init__.py (2)
modelopt/torch/quantization/backends/fp8_per_tensor_gemm.py (4)
fp8_per_tensor_gemm
(57-96)Fp8PerTensorLinear
(137-200)_fp8_availability_check
(99-134)apply
(197-200)modelopt/torch/quantization/backends/nvfp4_gemm.py (4)
nvfp4_gemm
(31-128)Nvfp4Linear
(131-193)_nvfp4_availability_check
(196-250)apply
(190-193)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: linux
- GitHub Check: wait-checks / wait
- GitHub Check: wait-checks / wait
- GitHub Check: build-docs
- GitHub Check: code-quality
🔇 Additional comments (1)
modelopt/torch/quantization/backends/__init__.py (1)
22-26
: LGTM!The explicit registration of
Fp8PerTensorLinear.apply
with its availability check correctly addresses the PR objective to explicitly register the GEMMs.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #402 +/- ##
==========================================
- Coverage 73.79% 73.55% -0.25%
==========================================
Files 171 172 +1
Lines 17591 17686 +95
==========================================
+ Hits 12982 13009 +27
- Misses 4609 4677 +68 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What does this PR do?
Type of change: ? Bug fix
Overview: ?
We need to explicitly register the gemms.
Summary by CodeRabbit
New Features
Bug Fixes
Refactor