Add SD3.5-medium quantization support in ModelOpt Diffusers example #444

vishalpandya1990 · 2025-10-17T06:14:11Z

What does this PR do?

Type of change: Diffusers' Example Update

Overview:

Add SD3.5-medium quantization config in quantization and export files of the diffusers example

Testing

Locally, ran the max calibration on Windows RTX 5090.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

New Features
- Added support for the sd3.5-medium model across quantization and ONNX export flows, including updated input/output mappings for exports.
- sd3.5-medium is now handled alongside existing SD3-medium pipelines for model selection and pipeline creation.
Bug Fixes / Improvements
- Quantization now records and logs operation duration to report total processing time.

Signed-off-by: vipandya <[email protected]>

coderabbitai · 2025-10-17T06:17:33Z

Walkthrough

Added support for the "sd3.5-medium" model across ONNX export and quantization flows, updated dynamic axes, input/output shape selection, pipeline creation, and registry entries; also added timing measurement for overall quantization duration.

Changes

Cohort / File(s)	Change Summary
Export configuration `examples/diffusers/quantization/onnx_utils/export.py`	Registered `"sd3.5-medium"` in `MODEL_ID_TO_DYNAMIC_AXES`; updated `update_dynamic_axes()` mapping for `out.0`; treated `"sd3.5-medium"` with `sd3-medium` group in `generate_dummy_inputs_and_dynamic_axes_and_shapes()`; added branch in `get_io_shapes()` selecting `out_hidden_states` and corresponding `minShapes`; added `input_names`/`output_names` for `"sd3.5-medium"` in `modelopt_export_sd()`.
Quantization workflow `examples/diffusers/quantization/quantize.py`	Added `ModelType.SD35_MEDIUM = "sd3.5-medium"` and registry mapping to `"stabilityai/stable-diffusion-3.5-medium"`; included SD35_MEDIUM in model filtering, `uses_transformer`, and SD3-based pipeline creation branches; replaced strict SD3 checks with multi-model checks (SD3_MEDIUM, SD35_MEDIUM) across quantization/calibration paths; added `time` import, record start time, and log elapsed time on completion; passed `quant_config.quantize_mha` into export flow.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI / main
    participant Manager as PipelineManager
    participant Quant as Quantization
    participant Export as ONNX Export

    CLI->>Manager: create_pipeline(model_id)
    Note over Manager: checks model type\n(including sd3-medium, sd3.5-medium)
    alt SD3-family (sd3-medium / sd3.5-medium)
        Manager-->>CLI: pipeline (SD3 pipeline)
    else Other models
        Manager-->>CLI: pipeline (other)
    end

    CLI->>Quant: run_quantization(pipeline, config)
    Note right of CLI: start_time recorded
    Quant->>Quant: calibration & quantize (uses_transformer check includes sd3.5-medium)
    Quant->>Export: modelopt_export_sd(..., quant_config.quantize_mha)
    Export->>Export: set dynamic axes / io shapes for sd3.5-medium
    Export-->>Quant: export result
    Quant-->>CLI: finished
    Note right of CLI: log elapsed time

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A medium model hops in bright,
Joined export, quantize, and flight.
Axes set, inputs align,
Timers ticking, all in line—
Hooray for hops in code tonight! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "Add SD3.5-medium quantization support in ModelOpt Diffusers example" directly and clearly summarizes the main changes in the changeset. The raw summary confirms that both modified files (export.py and quantize.py) add comprehensive support for the SD3.5-medium model across quantization and export workflows, including new enum members, registry mappings, dynamic axes configuration, and pipeline creation logic. The title is specific and actionable, using the exact model identifier ("SD3.5-medium") and scope ("quantization support" in "ModelOpt Diffusers example"), making it clear to any teammate reviewing the repository history what the primary change accomplishes.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch vipandya/fp4_sd_1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

examples/diffusers/quantization/quantize.py (1)

673-679: Possible AttributeError when checking Conv quantizers.

Conv modules may lack input_quantizer/weight_quantizer; direct attribute access can crash. Guard with getattr.

Apply this diff:

 def _has_conv_layers(self, model: torch.nn.Module) -> bool:
@@
-        for module in model.modules():
-            if isinstance(module, (torch.nn.Conv1d, torch.nn.Conv2d, torch.nn.Conv3d)) and (
-                module.input_quantizer.is_enabled or module.weight_quantizer.is_enabled
-            ):
-                return True
+        for module in model.modules():
+            if isinstance(module, (torch.nn.Conv1d, torch.nn.Conv2d, torch.nn.Conv3d)):
+                iq = getattr(module, "input_quantizer", None)
+                wq = getattr(module, "weight_quantizer", None)
+                if (getattr(iq, "is_enabled", False)) or (getattr(wq, "is_enabled", False)):
+                    return True
         return False

🧹 Nitpick comments (6)

examples/diffusers/quantization/quantize.py (5)
19-19: Minor nit: simplify import.

Use plain import time (drop as time) for clarity.
-import time as time
+import time
872-873: Improve timing precision and log formatting (optional).

Prefer perf_counter and concise message.
-    s = time.time()
+    start = time.perf_counter()
@@
-        logger.info(f"Quantization process completed successfully! Time taken = {time.time() - s} seconds")
+        elapsed = time.perf_counter() - start
+        logger.info(f"Quantization completed successfully in {elapsed:.2f}s")
Also applies to: 951-952

836-837: CLI help text matches behavior?

Help says “Quantizing MHA into FP8,” but code passes quantize_mha to checks for FP4 as well. Consider clarifying.
-    quant_group.add_argument(
-        "--quantize-mha", action="store_true", help="Quantizing MHA into FP8 if its True"
-    )
+    quant_group.add_argument(
+        "--quantize-mha",
+        action="store_true",
+        help="Quantize MHA when supported (FP8; FP4 path uses FP8 MHA gating)."
+    )
758-778: Docs: add sd3.5 example in epilog (optional).

Add a quick example alongside sd3-medium.
             # FP8 quantization with ONNX export
-            %(prog)s --model sd3-medium --format fp8 --onnx-dir ./onnx_models/
+            %(prog)s --model sd3-medium --format fp8 --onnx-dir ./onnx_models/
+            # FP8 quantization with ONNX export (SD3.5 Medium)
+            %(prog)s --model sd3.5-medium --format fp8 --onnx-dir ./onnx_models/
127-135: Optional: keep registry and CLI options in sync automatically.

Consider deriving CLI choices from MODEL_REGISTRY keys only to prevent drift.

Also applies to: 784-786
examples/diffusers/quantization/onnx_utils/export.py (1)
320-338: Reduce duplication with an SD3 family constant (optional).

Define a set like SD3_FAMILY = {"sd3-medium","sd3.5-medium"} and reuse in conditionals.
+SD3_FAMILY = {"sd3-medium", "sd3.5-medium"}
@@
-    elif model_id in ["sd3-medium", "sd3.5-medium"]:
+    elif model_id in SD3_FAMILY:
@@
-    elif model_name == "sd3-medium":
+    elif model_name == "sd3-medium":
         input_names = ["hidden_states", "encoder_hidden_states", "pooled_projections", "timestep"]
         output_names = ["sample"]
-    elif model_name == "sd3.5-medium":
+    elif model_name == "sd3.5-medium":
         input_names = ["hidden_states", "encoder_hidden_states", "pooled_projections", "timestep"]
         output_names = ["out_hidden_states"]
Also applies to: 416-447

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f8a9353 and 9350ecd.

📒 Files selected for processing (2)

examples/diffusers/quantization/onnx_utils/export.py (6 hunks)
examples/diffusers/quantization/quantize.py (9 hunks)

🔇 Additional comments (9)

examples/diffusers/quantization/quantize.py (4)

131-132: Registry ID for SD3.5 Medium looks correct (verified).

Mapping to "stabilityai/stable-diffusion-3.5-medium" is valid and compatible with StableDiffusion3Pipeline. (huggingface.co)

334-336: Pipeline routing for SD3.5 via StableDiffusion3Pipeline is correct.

Creation paths mirror SD3 Medium and align with HF usage for sd3.5-medium. (huggingface.co)

Also applies to: 365-369

119-120: Filter function and uses_transformer updates are consistent.

No issues; SD3.5 included alongside SD3 Medium.

Also applies to: 237-238

904-910: Sanity script to verify SD3.5 wiring across repo.

Run to confirm all branches handle "sd3.5-medium".

examples/diffusers/quantization/onnx_utils/export.py (5)

76-82: sd3.5 dynamic axes added correctly.

Output name uses "out_hidden_states" which matches the export branch below.

Please confirm your exported ONNX graph’s output is named "out_hidden_states" before renaming to "out.0" downstream (if applicable).

300-302: update_dynamic_axes handles sd3.5 rename to out.0.

Consistent with using "out_hidden_states" during export; good.

325-328: Grouping sd3 and sd3.5 in dummy input generation is correct.

Both use SD3Transformer2DModel and identical input tensors.

355-357: get_io_shapes updated for sd3.5.

I/O minShapes reference hidden_states as expected; OK.

Also applies to: 367-368

422-425: Export IO names for sd3.5 align with dynamic axes.

Inputs mirror sd3; output renamed to "out_hidden_states"; matches HF MMDiT-X changes.

codecov · 2025-10-17T06:29:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.37%. Comparing base (f8a9353) to head (dc8a02c).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #444   +/-   ##
=======================================
  Coverage   73.37%   73.37%           
=======================================
  Files         180      180           
  Lines       17937    17937           
=======================================
+ Hits        13161    13162    +1     
+ Misses       4776     4775    -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: vipandya <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

examples/diffusers/quantization/quantize.py (1)
19-19: Remove redundant import alias.

The import statement import time as time is redundant. Simply use import time.

Apply this diff:
-import time as time
+import time

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9350ecd and dc8a02c.

📒 Files selected for processing (2)

examples/diffusers/quantization/onnx_utils/export.py (5 hunks)
examples/diffusers/quantization/quantize.py (9 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

examples/diffusers/quantization/quantize.py (1)

examples/diffusers/quantization/utils.py (1)

filter_func_default (37-42)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: linux
GitHub Check: code-quality
GitHub Check: build-docs

🔇 Additional comments (11)

examples/diffusers/quantization/onnx_utils/export.py (5)

76-82: LGTM! Consistent dynamic axes configuration for sd3.5-medium.

The dynamic axes configuration correctly differentiates sd3.5-medium from sd3-medium by using "out_hidden_states" as the output key, which aligns with the subsequent usage in update_dynamic_axes (line 301) and modelopt_export_sd (line 422).

300-301: LGTM! Correct output mapping for sd3.5-medium.

The mapping of "out.0" to "out_hidden_states" is consistent with the dynamic axes definition at lines 76-82 and the export configuration at lines 420-422.

325-328: LGTM! Appropriate reuse of SD3 dummy input generation.

Treating sd3.5-medium the same as sd3-medium for dummy input generation is appropriate since they share similar model architectures and input requirements.

355-356: LGTM! Consistent I/O shape handling.

The output name selection ("out_hidden_states") and shape configuration correctly align with the dynamic axes definition and treat sd3.5-medium appropriately alongside sd3-medium.

Also applies to: 364-365

420-422: LGTM! Complete and consistent export configuration.

The input/output names for sd3.5-medium are correctly specified and align with the dynamic axes definition (lines 76-82) and the update_dynamic_axes logic (line 301), ensuring end-to-end consistency.

examples/diffusers/quantization/quantize.py (6)

63-63: LGTM! SD35_MEDIUM model type added correctly.

The new model type follows the existing naming convention and integrates properly with the model type enum.

119-119: LGTM! Appropriate filter function mapping.

Using filter_func_default for SD35_MEDIUM is consistent with SD3_MEDIUM and appropriate for the model architecture.

237-237: LGTM! Correct transformer classification.

SD35_MEDIUM correctly uses a transformer backbone like SD3_MEDIUM, so including it in the uses_transformer property is appropriate.

334-335: LGTM! Consistent pipeline creation for SD35_MEDIUM.

Both create_pipeline_from (static method) and create_pipeline (instance method) correctly treat SD35_MEDIUM the same as SD3_MEDIUM by using StableDiffusion3Pipeline, ensuring consistency across the codebase.

Also applies to: 365-368

872-872: LGTM! Timing instrumentation and parameter fix.

The timing measurement is correctly implemented, and line 949 fixes a subtle bug by using the instance variable quant_config.quantize_mha instead of the class name QuantizationConfig.quantize_mha, ensuring the actual configuration value is passed to the export function.

Also applies to: 949-953

131-131: No issues found. Model ID is correct and publicly accessible.

The model "stabilityai/stable-diffusion-3.5-medium" is a Stability AI MMDiT-X text-to-image generative model available on Hugging Face. The model is available under the Stability Community License, which permits free use for research, non-commercial, and commercial use for organizations with less than $1M in annual revenue.

Add SD3.5 medium quantization support in ModelOpt Diffusers Examples

9350ecd

Signed-off-by: vipandya <[email protected]>

vishalpandya1990 requested a review from a team as a code owner October 17, 2025 06:14

vishalpandya1990 requested a review from kevalmorabia97 October 17, 2025 06:14

vishalpandya1990 changed the title ~~Add SD3.5 medium quantization support in ModelOpt Diffusers example~~ Add SD3.5-medium quantization support in ModelOpt Diffusers example Oct 17, 2025

vishalpandya1990 self-assigned this Oct 17, 2025

coderabbitai bot reviewed Oct 17, 2025

View reviewed changes

kevalmorabia97 requested review from jingyu-ml and removed request for kevalmorabia97 October 17, 2025 10:07

tox fix

dc8a02c

Signed-off-by: vipandya <[email protected]>

coderabbitai bot reviewed Oct 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add SD3.5-medium quantization support in ModelOpt Diffusers example #444

Add SD3.5-medium quantization support in ModelOpt Diffusers example #444

Uh oh!

vishalpandya1990 commented Oct 17, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 17, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

codecov bot commented Oct 17, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add SD3.5-medium quantization support in ModelOpt Diffusers example #444

Are you sure you want to change the base?

Add SD3.5-medium quantization support in ModelOpt Diffusers example #444

Uh oh!

Conversation

vishalpandya1990 commented Oct 17, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vishalpandya1990 commented Oct 17, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 17, 2025 •

edited

Loading

codecov bot commented Oct 17, 2025 •

edited

Loading