Skip to content

Conversation

@gcunhase
Copy link
Contributor

@gcunhase gcunhase commented Oct 17, 2025

What does this PR do?

Type of change: Bug fix

Overview: QDQ node placement was breaking Conv-Act-Pool fusions. Fixed by adding this pattern to the list of fusible partitions.

Usage

$ python -m modelopt.onnx.quantization --onnx_path=$MODEL_NAME.onnx

Testing

Added unittest.

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: Yes
  • Did you add or update any necessary documentation?: No
  • Did you update Changelog?: No

Summary by CodeRabbit

  • New Features

    • Fusion now recognizes Conv → Activation → MaxPool sequences for improved quantization handling.
    • Broader operator coverage to better handle copy/reshape-like ops during optimization.
  • Refactor

    • Centralized operator classification for more consistent fusion and traversal behavior.
  • Tests

    • Added parameterized tests validating Conv-Activation-Pool quantization (with and without reshape).

@gcunhase gcunhase requested a review from a team as a code owner October 17, 2025 15:06
@gcunhase gcunhase requested a review from ajrasane October 17, 2025 15:06
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 17, 2025

Walkthrough

The PR adds a get_copy_ops() helper, extends fusion/backbone matching to treat copy ops and MaxPool as fusible in quantization graph utilities and partitioning, updates usages to pass wild_card_types=get_copy_ops(), and adds a Conv→BN→ReLU→(optional Reshape)→MaxPool→Conv test model and unit test (duplicate model builder present).

Changes

Cohort / File(s) Summary
Copy Operators Helper
modelopt/onnx/op_types.py
Add get_copy_ops() returning extended copy-op list; update is_copy_op() to delegate to get_copy_ops() (no public signature change).
Graph utils: fusion/backbone updates
modelopt/onnx/quantization/graph_utils.py
Import get_copy_ops; use wild_card_types=get_copy_ops() in get_fusible_backbone() and in calls to has_path_type(); remove explicit empty list usage; remove pre-initialized nodes_to_exclude in find_nodes_from_matmul_to_exclude.
Partitioning: fusibility expansion
modelopt/onnx/quantization/partitioning.py
Expand fusible criteria to include copy ops and MaxPool in backbone checks and consumer traversal; update _build_fusible_partition() and _is_fusible_mul() logic to accept copy ops and MaxPool as part of fusible backbones and consumers.
Test model builder (duplicate)
tests/_test_utils/onnx_quantization/lib_test_models.py
Add build_conv_act_pool_model(include_reshape_node=False) constructing Conv→BatchNorm→ReLU→optional Reshape→MaxPool→Conv; function added twice (duplicate definitions present).
Unit tests: int8 QDQ rules
tests/unit/onnx/test_qdq_rules_int8.py
Add parameterized test test_conv_act_pool_int8(tmp_path, include_reshape_node) verifying Conv quantization and MaxPool not quantized, using the new builder and save/quantize flow.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Runner
    participant Builder as build_conv_act_pool_model()
    participant Save as save_onnx()
    participant Quant as Quantizer
    participant Partition as partitioning._build_fusible_partition()
    participant Graph as graph_utils.get_fusible_backbone()
    participant Ops as op_types.get_copy_ops()

    Test->>Builder: build model (include_reshape_node?)
    Builder-->>Test: ONNX model
    Test->>Save: persist model
    Test->>Quant: quantize to int8
    Quant->>Partition: analyze fusible partitions
    Partition->>Graph: request fusible backbone
    Graph->>Ops: get_copy_ops() (includes copy ops, MaxPool)
    Ops-->>Graph: copy op list
    Graph-->>Partition: matched backbone paths (Conv-Act(-Pool))
    Partition-->>Quant: fusion decisions
    Quant-->>Test: quantized model
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hopping through nodes both new and old,

I tuck MaxPool where fusions unfold.
Copy ops gathered, patterns re-sown,
Two builders echo — one needs a clone. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "Add support for Conv-Act-Pool fusion" directly and accurately summarizes the main objective of the changeset. All modifications across the multiple files in this pull request are consistently focused on enabling Conv-Act-Pool pattern fusion in the ONNX quantization pipeline: graph_utils.py and partitioning.py are updated to support the Conv-Act-Pool pattern, op_types.py introduces supporting infrastructure through get_copy_ops(), and test files add validation for the new functionality. The title is clear, specific, and concise—a teammate reviewing the repository history would immediately understand the purpose of this change.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3db4dc8 and 7eae896.

📒 Files selected for processing (5)
  • modelopt/onnx/op_types.py (2 hunks)
  • modelopt/onnx/quantization/graph_utils.py (3 hunks)
  • modelopt/onnx/quantization/partitioning.py (2 hunks)
  • tests/_test_utils/onnx_quantization/lib_test_models.py (1 hunks)
  • tests/unit/onnx/test_qdq_rules_int8.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • modelopt/onnx/quantization/graph_utils.py
🧰 Additional context used
🧬 Code graph analysis (2)
tests/unit/onnx/test_qdq_rules_int8.py (2)
tests/_test_utils/onnx_quantization/lib_test_models.py (1)
  • build_conv_act_pool_model (560-706)
tests/unit/onnx/test_quantize_int8.py (1)
  • _assert_nodes_are_quantized (32-39)
modelopt/onnx/quantization/partitioning.py (1)
modelopt/onnx/op_types.py (2)
  • is_copy_op (121-123)
  • is_pointwise_or_elementwise_op (131-136)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: build-docs
  • GitHub Check: code-quality
🔇 Additional comments (7)
modelopt/onnx/op_types.py (2)

99-118: LGTM!

The new get_copy_ops() helper centralizes the list of copy operators, making it reusable across the quantization modules. The function is well-documented and aligns with the PR's objective to support Conv-Act-Pool fusion.


121-123: LGTM!

The refactoring to delegate membership checks to get_copy_ops() eliminates duplication and maintains consistency with the centralized list.

tests/unit/onnx/test_qdq_rules_int8.py (1)

21-31: LGTM!

The new imports (pytest, build_conv_act_pool_model, save_onnx) are appropriate for the added test and properly utilized.

modelopt/onnx/quantization/partitioning.py (3)

46-50: LGTM!

The updated docstring accurately documents the expanded fusibility criteria, including copy ops (condition 1) and Conv-Act-Pool fusion via MaxPool (condition 3).


133-141: LGTM!

The logic correctly treats copy ops as fusible consumers when they satisfy the _is_cask_fusible criterion, paralleling the existing pointwise/elementwise op handling.


142-145: LGTM!

Adding MaxPool to the fusible backbone check enables Conv-Act-Pool fusion, directly supporting the PR's objective to prevent QDQ placement from breaking these patterns.

tests/_test_utils/onnx_quantization/lib_test_models.py (1)

560-706: No duplicate function definition found.

Verification confirms only a single definition of build_conv_act_pool_model exists at line 560. The original review comment's claim about duplicate definitions is unfounded—there is no duplication issue in the code.

Likely an incorrect or invalid review comment.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Oct 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.42%. Comparing base (8745a3c) to head (3a1e3fb).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #448      +/-   ##
==========================================
+ Coverage   73.40%   73.42%   +0.01%     
==========================================
  Files         180      180              
  Lines       18077    18078       +1     
==========================================
+ Hits        13270    13273       +3     
+ Misses       4807     4805       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gcunhase gcunhase changed the title [5271050] Add support for Conv-Act-Pool fusion [5271050, 5274346] Add support for Conv-Act-Pool fusion Oct 17, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
modelopt/onnx/op_types.py (2)

99-118: Clarify “copy” semantics; consider narrowing set and caching.

Including Gather*/Scatter*/OneHot here blurs “copy/layout” with indexing, write, and generation ops. This may over-traverse/fuse across nodes that change values. If intentional, document it; otherwise, consider restricting to pure layout/data-movement ops (e.g., Flatten/Transpose/Concat/Split/Squeeze/Reshape/Tile/Expand/Slice). Also avoid rebuilding the list on every call.

Suggested refactor (keeps API, improves perf):

+from functools import lru_cache
+
- def copy_ops():
-    """Returns list of copy operators."""
-    return [
+@lru_cache(None)
+def copy_ops():
+    """Op types treated as copy/layout for traversal and CASK fusibility.
+    Note: if indexing/scatter/generation ops are intentionally included, keep them and update this docstring accordingly."""
+    return [
         "Flatten",
         "Transpose",
         "Concat",
         "Split",
         "Squeeze",
         "Expand",
         "ReverseSequence",
         "Reshape",
         "Tile",
         "Gather",
         "Slice",
         "GatherElements",
         "GatherND",
         "ScatterElements",
         "ScatterND",
         "OneHot",
     ]

Would you like this set narrowed to pure layout ops, or is the broader set required for current fusibility rules?


121-124: Minor: use a set for membership.

Membership on a list is O(n). If we keep the function, consider a module-level frozenset for checks.

-def is_copy_op(op_type: str):
-    """Returns whether the given op is a copy operator or not."""
-    return op_type in copy_ops()
+def is_copy_op(op_type: str):
+    """Returns whether the given op is a copy operator or not."""
+    return op_type in set(copy_ops())
modelopt/onnx/quantization/partitioning.py (1)

47-51: Doc nit and scope clarification.

  • Grammar: “Add nodes to the partition…”.
  • Confirm intent: bullet 3 explicitly mentions MaxPool; if AveragePool is also supported (now or later), call it out or keep it MaxPool-only by design.
-    Add a nodes to the partition if any of these holds:
+    Add nodes to the partition if any of these holds:
-    3. The node is MaxPool following a Conv-Act pattern (Conv-Act-Pool fusion)
+    3. The node is MaxPool following a Conv-Act pattern (Conv-Act-Pool fusion)
+       # Note: AveragePool is intentionally excluded.

Is AveragePool intentionally excluded from Conv-Act-Pool fusion?

tests/unit/onnx/test_quantize_int8.py (1)

33-45: Simplify helper: no need to return True or wrap with assert.

The helper already asserts per input. Returning True and then asserting the return is redundant.

-def _assert_nodes_quantization(nodes, should_be_quantized=True):
+def _assert_nodes_quantization(nodes, should_be_quantized=True):
     for node in nodes:
         for inp_idx, inp in enumerate(node.inputs):
             if isinstance(inp, gs.Variable):
                 if should_be_quantized:
                     assert node.i(inp_idx).op == "DequantizeLinear", (
                         f"Input '{inp.name}' of node '{node.name}' is not quantized but should be!"
                     )
                 else:
                     assert node.i(inp_idx).op != "DequantizeLinear", (
                         f"Input '{inp.name}' of node '{node.name}' is quantized but should not be!"
                     )
-    return True
+    return None

And drop outer asserts at call sites (lines 68-69, 89-90, 119-120, 123):

-assert _assert_nodes_quantization(mm_nodes)
+_assert_nodes_quantization(mm_nodes)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between be0470c and b442a28.

📒 Files selected for processing (5)
  • modelopt/onnx/op_types.py (2 hunks)
  • modelopt/onnx/quantization/graph_utils.py (3 hunks)
  • modelopt/onnx/quantization/partitioning.py (2 hunks)
  • tests/_test_utils/onnx_quantization/lib_test_models.py (1 hunks)
  • tests/unit/onnx/test_quantize_int8.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • modelopt/onnx/quantization/graph_utils.py
  • tests/_test_utils/onnx_quantization/lib_test_models.py
🧰 Additional context used
🧬 Code graph analysis (2)
modelopt/onnx/quantization/partitioning.py (1)
modelopt/onnx/op_types.py (2)
  • is_copy_op (121-123)
  • is_pointwise_or_elementwise_op (131-136)
tests/unit/onnx/test_quantize_int8.py (1)
tests/_test_utils/onnx_quantization/lib_test_models.py (1)
  • build_conv_act_pool_model (559-705)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: wait-checks / wait
  • GitHub Check: linux
  • GitHub Check: build-docs
  • GitHub Check: code-quality
🔇 Additional comments (2)
tests/unit/onnx/test_quantize_int8.py (1)

100-123: Good coverage for Conv-Act-Pool (+optional Reshape).

Parametrizing include_reshape_node validates copy-op traversal and MaxPool staying in the quantized domain. LGTM.

modelopt/onnx/quantization/partitioning.py (1)

135-146: Verification confirms fusion rule extension is sound.

The script output confirms all three verification points:

  1. get_fusible_backbone recognizes MaxPool chains via the fusible_linear_path_types including ["MaxPool", "Relu", "BatchNormalization", "BiasAdd", conv_type]
  2. copy_ops() is used as wildcards in path traversal with has_path_type(..., wild_card_types=copy_ops())
  3. The MaxPool fusion in partitioning.py (lines 143-144) is properly gated by get_fusible_backbone

No issues found. Code changes align with backbone discovery logic and wildcard handling.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
modelopt/onnx/quantization/graph_utils.py (2)

185-207: Two issues: missing guard in _get_backbone() and incomplete Conv‑Act‑Pool patterns.

  • Runtime risk: IndexError when a tensor is a graph input (len(inputs)==0).
  • Pattern gap: Conv→Act→Pool without BN (common) won’t match; current list requires BN before MaxPool.

Apply this diff:

@@
 def get_fusible_backbone(node: Node, graph: Graph) -> Node | None:
@@
-    def _get_backbone(root: Node):
+    def _get_backbone(root: Node):
         if root.op in ["Conv", "ConvTranspose"]:
             return root
-
-        for tensor in root.inputs:
-            if not isinstance(tensor, Constant):
-                parent_node = tensor.inputs[0]
-                bb = _get_backbone(parent_node)
-                if bb:
-                    return bb
+        for tensor in root.inputs:
+            if isinstance(tensor, Constant):
+                continue
+            # Guard against graph inputs (no producers)
+            if len(tensor.inputs) == 0:
+                continue
+            parent_node = tensor.inputs[0]
+            bb = _get_backbone(parent_node)
+            if bb:
+                return bb
@@
-    for conv_type in ["Conv", "ConvTranspose"]:
+    for conv_type in ["Conv", "ConvTranspose"]:
         fusible_linear_path_types += [
             ["BiasAdd", "ConstMul", conv_type],
             ["Relu", "BiasAdd", "ConstMul", conv_type],
             ["BatchNormalization", "BiasAdd", conv_type],
             ["Relu", "BatchNormalization", "BiasAdd", conv_type],
-            ["MaxPool", "Relu", "BatchNormalization", "BiasAdd", conv_type],
+            # Conv-Act-Pool (with BN)
+            ["MaxPool", "Relu", "BatchNormalization", "BiasAdd", conv_type],
+            # Conv-Act-Pool (no BN)
+            ["MaxPool", "Relu", "BiasAdd", "ConstMul", conv_type],
         ]

Additionally, consider supporting other TRT-fusible activations (e.g., LeakyRelu/Clip) if applicable.


1060-1111: Robustness of MatMul shape inference: include graph outputs and use stricter infer options.

  • value_info_map only scans value_info; MatMul outputs that are graph outputs won’t be found → RuntimeError.
  • Prefer stricter inference (check types, data propagation) to reduce “unknown” shapes.

Apply this diff:

@@
-    model.graph.ClearField("value_info")
-    model = infer_shapes(model)
-    value_info_map = {vi.name: vi for vi in model.graph.value_info}
+    model.graph.ClearField("value_info")
+    # Enable stricter inference to populate more shapes
+    model = infer_shapes(model, check_type=True, data_prop=True)
+    value_info_map = {vi.name: vi for vi in model.graph.value_info}
+    # Also consider graph outputs (MatMul outputs may be final graph outputs)
+    value_info_map.update({vi.name: vi for vi in model.graph.output})
@@
-        value_info = value_info_map.get(output_name)
-        if not value_info:
-            raise RuntimeError(f"Shape inference did not find shape for {output_name}.")
+        value_info = value_info_map.get(output_name)
+        if not value_info:
+            # Skip if shape couldn't be inferred; fallback will be inference-based path upstream
+            logger.debug(f"Shape inference missing for {output_name}; skipping MatMul '{matmul_node.name}'.")
+            continue
♻️ Duplicate comments (1)
modelopt/onnx/quantization/graph_utils.py (1)

1090-1090: Minor: pass explicit kwargs to infer_shapes for consistency.

Already covered in the diff above by adding check_type=True, data_prop=True.

🧹 Nitpick comments (1)
modelopt/onnx/quantization/graph_utils.py (1)

206-207: Mutable default arg in has_path_type() can leak state across calls.

path_nodes defaulting to [] is mutated (append) and reused. Make defaults None and init inside.

Apply this diff:

@@
-def has_path_type(
+def has_path_type(
     node: Node,
     graph: Graph,
     path_type: list[str],
     is_forward: bool,
-    wild_card_types: list[str] = [],
-    path_nodes: list[Node] = [],
+    wild_card_types: list[str] | None = None,
+    path_nodes: list[Node] | None = None,
 ) -> bool:
@@
-    optional_path_types = ["BiasAdd", "ConstMul"]
+    optional_path_types = ["BiasAdd", "ConstMul"]
+    if wild_card_types is None:
+        wild_card_types = []
+    if path_nodes is None:
+        path_nodes = []
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b442a28 and 52771de.

📒 Files selected for processing (1)
  • modelopt/onnx/quantization/graph_utils.py (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
modelopt/onnx/quantization/graph_utils.py (2)
modelopt/onnx/op_types.py (3)
  • copy_ops (99-118)
  • is_copy_op (121-123)
  • is_linear_op (126-128)
modelopt/onnx/utils.py (4)
  • find_lowest_common_ancestor (572-613)
  • get_child_nodes (625-628)
  • get_parent_nodes (616-622)
  • infer_shapes (723-736)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: code-quality
  • GitHub Check: build-docs
🔇 Additional comments (3)
modelopt/onnx/quantization/graph_utils.py (3)

32-32: Centralizing wildcard ops via copy_ops() looks good.

Helps keep path matching consistent with op_types.


173-176: Docstring update aligns with intent to include MaxPool.

Matches PR goal of Conv-Act-Pool fusion mention.


34-41: ****

The review incorrectly identifies the bug. The actual code at line 603 of modelopt/onnx/utils.py reads:

lowest_common_ancestor = common_ancestors.pop()

Not lowest_common_ancestor = common_ancestor as claimed. The variable common_ancestors (plural) is a properly defined set from line 601, and .pop() is a valid set method. There is no undefined variable, and no build paths would be broken.

Likely an incorrect or invalid review comment.

@gcunhase gcunhase force-pushed the dev/gcunhasergio/add_conv_act_pool_fusion_5271050 branch from 99766cd to 858c3da Compare October 17, 2025 18:41
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modelopt/onnx/quantization/graph_utils.py (1)

173-205: Fix mutable default arguments in has_path_type function signature (line 84-90).

The function has mutable default parameters wild_card_types: list[str] = [] (line 89) and path_nodes: list[Node] = [] (line 90). Since path_nodes is modified via append() at line 150, these lists persist across function calls, causing state leakage. When callers omit path_nodes (e.g., line 206), the same default list accumulates nodes from previous invocations.

Apply the suggested refactor to the function signature:

def has_path_type(
    node: Node,
    graph: Graph,
    path_type: list[str],
    is_forward: bool,
    wild_card_types: list[str] | None = None,
    path_nodes: list[Node] | None = None,
) -> bool:
    if wild_card_types is None:
        wild_card_types = []
    if path_nodes is None:
        path_nodes = []
    # ... rest of function
♻️ Duplicate comments (1)
modelopt/onnx/quantization/graph_utils.py (1)

1060-1110: Shape-inference path drops dict calibration_shapes (re-raising prior feedback).

Only string specs are parsed; dict inputs are ignored, defaulting dims to 1 and misclassifying MatMuls. Please accept dicts directly.

Apply this diff:

@@ def _exclude_matmuls_by_shape_inference(
-    input_shapes = (
-        parse_shapes_spec(calibration_shapes)
-        if (calibration_shapes and isinstance(calibration_shapes, str))
-        else {}
-    )
+    input_shapes = {}
+    if calibration_shapes:
+        if isinstance(calibration_shapes, str):
+            input_shapes = parse_shapes_spec(calibration_shapes)
+        elif isinstance(calibration_shapes, dict):
+            input_shapes = calibration_shapes

Additionally, consider using both value_info and outputs in the lookup to be resilient to where ONNX stores inferred shapes:

-    value_info_map = {vi.name: vi for vi in model.graph.value_info}
+    value_info_map = {vi.name: vi for vi in model.graph.value_info}
+    value_info_map.update({vi.name: vi for vi in model.graph.output})
🧹 Nitpick comments (2)
modelopt/onnx/op_types.py (1)

99-119: Broaden copy-ops: include Unsqueeze and Identity.

These are commonly treated as copy/no-op in traversals; adding them improves wildcard path robustness for fusion.

Apply this diff:

 def get_copy_ops():
     """Returns list of copy operators."""
     return [
         "Flatten",
         "Transpose",
         "Concat",
         "Split",
         "Squeeze",
+        "Unsqueeze",
         "Expand",
         "ReverseSequence",
         "Reshape",
         "Tile",
         "Gather",
         "Slice",
         "GatherElements",
         "GatherND",
         "ScatterElements",
         "ScatterND",
         "OneHot",
+        "Identity",
     ]

Also applies to: 121-124

tests/unit/onnx/test_quantize_int8.py (1)

33-45: Harden producer lookup to avoid graph-input edge cases.

node.i(inp_idx) can fail when an input is a graph input (no producer). Guard with inp.inputs.

Apply this diff:

 def _assert_nodes_quantization(nodes, should_be_quantized=True):
     for node in nodes:
         for inp_idx, inp in enumerate(node.inputs):
             if isinstance(inp, gs.Variable):
-                if should_be_quantized:
-                    assert node.i(inp_idx).op == "DequantizeLinear", (
+                producer = node.i(inp_idx) if inp.inputs else None
+                if should_be_quantized:
+                    assert producer is not None and producer.op == "DequantizeLinear", (
                         f"Input '{inp.name}' of node '{node.name}' is not quantized but should be!"
                     )
                 else:
-                    assert node.i(inp_idx).op != "DequantizeLinear", (
+                    assert producer is None or producer.op != "DequantizeLinear", (
                         f"Input '{inp.name}' of node '{node.name}' is quantized but should not be!"
                     )
     return True
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99766cd and 858c3da.

📒 Files selected for processing (5)
  • modelopt/onnx/op_types.py (2 hunks)
  • modelopt/onnx/quantization/graph_utils.py (6 hunks)
  • modelopt/onnx/quantization/partitioning.py (2 hunks)
  • tests/_test_utils/onnx_quantization/lib_test_models.py (1 hunks)
  • tests/unit/onnx/test_quantize_int8.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • modelopt/onnx/quantization/partitioning.py
🧰 Additional context used
🧬 Code graph analysis (3)
modelopt/onnx/quantization/graph_utils.py (2)
modelopt/onnx/op_types.py (3)
  • get_copy_ops (99-118)
  • is_copy_op (121-123)
  • is_linear_op (126-128)
modelopt/onnx/utils.py (3)
  • get_child_nodes (625-628)
  • get_parent_nodes (616-622)
  • infer_shapes (723-736)
tests/_test_utils/onnx_quantization/lib_test_models.py (1)
modelopt/onnx/utils.py (1)
  • check_model (557-569)
tests/unit/onnx/test_quantize_int8.py (1)
tests/_test_utils/onnx_quantization/lib_test_models.py (1)
  • build_conv_act_pool_model (560-706)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: build-docs
  • GitHub Check: code-quality
🔇 Additional comments (3)
modelopt/onnx/quantization/graph_utils.py (1)

206-207: Good: wildcarding through copy-ops enables Conv→Act→(copy)→Pool patterns.

Using wild_card_types=get_copy_ops() here is the right call to allow optional reshape/transpose between backbone ops.

tests/unit/onnx/test_quantize_int8.py (1)

100-123: Nice coverage: Conv is quantized; MaxPool remains unquantized across optional Reshape.

This directly validates the Conv‑Act‑(copy)‑Pool fusion behavior the PR targets.

tests/_test_utils/onnx_quantization/lib_test_models.py (1)

560-706: No issues found. The optional refactor suggestion is valid and safe to apply.

The verification confirms:

  • Single definition and call site (no shadowing)
  • Test function contains no hardcoded shape assertions or dimension-dependent logic
  • Proposed tensor shapes are mathematically consistent with the MaxPool stride=2 operation
  • Refactor reduces model size and inference cost without affecting test correctness

The suggestion to reduce from (32, 64, 256, 256) to (1, 64, 16, 16) input shapes is sound and will improve CI performance without breaking the quantization tests.

@gcunhase gcunhase changed the title [5271050, 5274346] Add support for Conv-Act-Pool fusion [5271050, 5274346][ONNX] Add support for Conv-Act-Pool fusion Oct 17, 2025
@gcunhase gcunhase force-pushed the dev/gcunhasergio/add_conv_act_pool_fusion_5271050 branch from 60dc169 to 3db4dc8 Compare October 17, 2025 20:01
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
modelopt/onnx/quantization/graph_utils.py (1)

1060-1076: Dict calibration_shapes are silently ignored.

The existing issue flagged in past reviews remains: when calibration_shapes is a dict, it's not parsed (only string specs are handled at lines 1072-1076), causing dimensions to default to 1 and potentially leading to false positive GEMV exclusions.

Consider applying the fix suggested in the past review:

-    input_shapes = (
-        parse_shapes_spec(calibration_shapes)
-        if (calibration_shapes and isinstance(calibration_shapes, str))
-        else {}
-    )
+    input_shapes = {}
+    if calibration_shapes:
+        if isinstance(calibration_shapes, str):
+            input_shapes = parse_shapes_spec(calibration_shapes)
+        elif isinstance(calibration_shapes, dict):
+            input_shapes = calibration_shapes
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 858c3da and 3db4dc8.

📒 Files selected for processing (5)
  • modelopt/onnx/op_types.py (2 hunks)
  • modelopt/onnx/quantization/graph_utils.py (6 hunks)
  • modelopt/onnx/quantization/partitioning.py (2 hunks)
  • tests/_test_utils/onnx_quantization/lib_test_models.py (1 hunks)
  • tests/unit/onnx/test_qdq_rules_int8.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
modelopt/onnx/quantization/partitioning.py (1)
modelopt/onnx/op_types.py (2)
  • is_copy_op (121-123)
  • is_pointwise_or_elementwise_op (131-136)
tests/_test_utils/onnx_quantization/lib_test_models.py (1)
modelopt/onnx/utils.py (1)
  • check_model (557-569)
modelopt/onnx/quantization/graph_utils.py (2)
modelopt/onnx/op_types.py (1)
  • get_copy_ops (99-118)
modelopt/onnx/utils.py (1)
  • infer_shapes (723-736)
tests/unit/onnx/test_qdq_rules_int8.py (3)
tests/_test_utils/onnx_quantization/lib_test_models.py (1)
  • build_conv_act_pool_model (560-706)
modelopt/onnx/quantization/int8.py (1)
  • quantize (113-291)
modelopt/onnx/quantization/quantize.py (1)
  • quantize (209-564)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: build-docs
  • GitHub Check: code-quality
🔇 Additional comments (5)
modelopt/onnx/op_types.py (1)

99-123: LGTM! Clean refactoring that improves maintainability.

The extraction of copy operators into a dedicated get_copy_ops() helper is a solid refactoring that centralizes the definition and enables reuse across the codebase (e.g., in graph_utils.py and partitioning.py).

modelopt/onnx/quantization/partitioning.py (1)

133-146: LGTM! Conv-Act-Pool fusion support correctly implemented.

The changes appropriately extend the fusible criteria to include:

  1. Copy ops as fusible consumers (lines 135-137)
  2. MaxPool as a fusible backbone operation (line 143)

This enables the Conv-Act-Pool pattern to be correctly fused during quantization, which aligns with the PR objectives.

modelopt/onnx/quantization/graph_utils.py (2)

32-32: LGTM! Import changes support the Conv-Act-Pool fusion feature.

The addition of get_copy_ops import and infer_shapes utility, along with their usage at line 206, correctly enables MaxPool and copy operators to be treated as wildcards in fusible backbone matching.

Also applies to: 38-38, 206-206


1060-1060: Function rename is consistent.

The rename from _exclude_matmuls_by_symbolic_inference to _exclude_matmuls_by_shape_inference is applied consistently at the definition (line 1060) and call site (line 968), and aligns with the refactored shape inference approach.

Also applies to: 968-968

tests/_test_utils/onnx_quantization/lib_test_models.py (1)

560-706: LGTM! Well-structured test model builder.

The build_conv_act_pool_model function correctly constructs a Conv-Act-Pool graph with:

  • Proper node sequencing (Conv → BatchNormalization → Relu → optional Reshape → MaxPool → Conv)
  • Correctly sized initializers matching the declared dimensions
  • Conditional Reshape node and its initializer based on the include_reshape_node parameter
  • Shape inference and validation before returning

This provides solid test coverage for the Conv-Act-Pool fusion feature.

@gcunhase gcunhase force-pushed the dev/gcunhasergio/add_conv_act_pool_fusion_5271050 branch from 3db4dc8 to 7eae896 Compare October 21, 2025 13:57
@gcunhase gcunhase force-pushed the dev/gcunhasergio/add_conv_act_pool_fusion_5271050 branch from 7eae896 to fd59bdf Compare October 24, 2025 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant