WIP: Feat: Add ONNX Sub Functions Export Feature #613

vbaddi · 2025-11-09T11:42:20Z

This PR introduces support for exporting ONNX modules as functions, enabling more efficient model compilation and execution on hardware.

Changes

Added new environment variable QEFF_USE_ONNX_FUNCTIONS to control ONNX function export behavior
Integrated ONNX function export capability into the inference pipeline

Enable ONNX Functions Export

Set the environment variable before running inference:

export QEFF_USE_ONNX_FUNCTIONS=true

Export and Execute with ONNX Functions

python -m QEfficient.cloud.infer \
  --model-name gpt2 \
  --num-cores 16 \
  --device-group "[0]" \
  --prompt "My name is" \
  --num-layers 2

Backward Compatibility

This feature is opt-in and requires explicit environment variable. Existing workflows remain unaffected when the flag is disabled.

- Auto-detect decoder layers for export_modules_as_functions based on model type - Add CustomOpTransform to dynamically register and include custom ops (CustomRMSNorm, CtxGather, CtxScatter) - Fix invalid INT32_MAX indices in ONNX runtime by replacing with 0 - Support ONNX functions export via QEFF_USE_ONNX_FUNCTIONS env var - Handle rope_scaling None values gracefully for Gemma3 Signed-off-by: vbaddi <[email protected]>

Signed-off-by: vbaddi <[email protected]>

Signed-off-by: Vinayak Baddi <[email protected]>

quic-rishinr · 2025-11-10T06:19:45Z

QEfficient/base/onnx_transforms.py

+        """
+        transformed = False
+        onnx_slim_transform = True  # kwargs.get("enable_onnx_slim_transform", False)
+        temp_onnx_path = kwargs.get("temp_onnx_path", None)


Can we make it as a mandiatory argument? and onnx_base_dir is unused here

quic-rishinr · 2025-11-10T06:23:04Z

QEfficient/base/onnx_transforms.py

+        :param temp_onnx_path: Path to save the slimmed ONNX model.
+        """
+        transformed = False
+        onnx_slim_transform = True  # kwargs.get("enable_onnx_slim_transform", False)


if OnnxSlimTransform is called do you need to again have a flag for onnx_slim_transform = True? and then check it on line 130? expectation should be to apply the onnxslimtransform right?

We can remove it from here. There is a flag called "enable_onnx_slim_transform" lets users decide whether to enable ONNX Slim. We can add a condition in modeling_auto so that this transform is included in the _onnx_transform list only when the flag is enabled.

I tested this change with GPTOSS, it fails in the onnxslim transform. Discussed with VB that this doesn't help us much.
Lets not add extra package dependency if it has limited use.
Let's remove onnxslim

Can you provide the error log? I think there should be a 5% performance gain with onnxslim.

@inisis thanks. you mean ~5% gain in perf.? w/onnxslim

Not sure about GPT OSS but for Qwen 2.5 VL we observed that onnxslim removes identical nodes which lead to creation of dummy nodes.

@abhishek-singh591 Hi, what's that dummy nodes, it should not be created by onnxslim, the removes of identical nodes is generally known as CSE, it reduces extra computation, I thinks it's very useful.

The nodes left behind after removing identical can be considered dummy/orphan nodes. These are nodes that were originally connected as outputs to the identical nodes but, after CSE they no longer have valid connections. Ideally, CSE should rewire the inputs and outputs properly so that no orphan nodes remain right?

Yes, onnxslim will remove those dummy nodes automatically.

Yes, onnxslim will remove those dummy nodes automatically.

Actually, it's not doing and we also don’t want to delete those nodes. Before removing identical nodes through CSE, it should connect the input of the identity node directly to its output, ensuring the graph remains valid.

Yes, onnxslim will remove those dummy nodes automatically.

Actually, it's not doing and we also don’t want to delete those nodes. Before removing identical nodes through CSE, it should connect the input of the identity node directly to its output, ensuring the graph remains valid.

Really, can you provide me an example, many thanks.

quic-rishinr · 2025-11-10T06:24:18Z

QEfficient/transformers/models/gemma3/modeling_gemma3.py

        inv_freq = 1.0 / (self.base ** (torch.arange(0, self.dim, 2, dtype=torch.int64).float().to(device) / self.dim))

-        if hasattr(config, "rope_scaling") and "factor" in config.rope_scaling:
+        if hasattr(config, "rope_scaling") and config.rope_scaling is not None and "factor" in config.rope_scaling:


is this change part of ONNX Sub Functions?

No, but the correct modeling representation changes.

quic-rishinr · 2025-11-10T06:28:06Z

QEfficient/transformers/models/modeling_auto.py

                    example_inputs["past_key_values"][i].append(torch.zeros(pkv_cache[0][0].shape, dtype=torch.float32))
                    dynamic_axes[f"past_{kv}.{i}"] = pkv_dynamic_axes
-                    output_names.append(f"past_{kv}.{i}_RetainedState")
+                    output_names.append(f"past_{kv}.{i}_InternalRetainedState")


Why we are renaming it? if we are renaming _RetainedState to _InternalRetainedState wouldnt the chages need to added on text_generation_inference and other places we are skipping the bufferes? Even if we are not enabling the subfunction this would impact regular execution

quic-rishinr · 2025-11-10T06:36:45Z

QEfficient/utils/constants.py

 ONNX_EXPORT_EXAMPLE_FBS = 4
 ONNX_EXPORT_EXAMPLE_NLK = 2  # Number of Logits to Keep
-ONNX_EXPORT_OPSET = 13
+ONNX_EXPORT_OPSET = 17


some test on opset 17 is still ongoing @quic-hemagnih are we good to merge opset 17 changes?

Sure, but export_module_as_function has a hard constraint that opset must be >=15.

quic-rishinr · 2025-11-10T07:37:48Z

QEfficient/__init__.py


+# Apply patches
+# TODO: Find a better way to do this, this is temp. fix.
+apply_torch_patches()


If we are not enabling subfunction do we need to do the monkey patching?

If we are not enabling subfunction then monkey patching is not required but doing this won't harm execution, we have checked the generation w/o subfunction and monkey patching, though we can put a condition for this too.

QEfficient/base/modeling_qeff.py

quic-rishinr · 2025-11-10T07:54:33Z

QEfficient/transformers/models/modeling_auto.py

-    _onnx_transforms = [FP16ClipTransform, SplitTensorsTransform]
+    _onnx_transforms = [
+        FP16ClipTransform,
+        CustomOpTransform,


do we need to apply the CustomOpTransform again after export?

ochougul · 2025-11-11T15:42:01Z

It might be more intuitive and flexible to have a dedicated flag in the export configuration—something like use_subfunctions or export_submodules_as_functions
Using an environment variable makes it harder to switch between the two approaches dynamically, especially during development or testing. A flag would offer clearer intent and better usability.
We can add this flag to the export API in auto classes

vbaddi · 2025-11-11T16:51:44Z

It might be more intuitive and flexible to have a dedicated flag in the export configuration—something like use_subfunctions or export_submodules_as_functions Using an environment variable makes it harder to switch between the two approaches dynamically, especially during development or testing. A flag would offer clearer intent and better usability. We can add this flag to the export API in auto classes

Hmm, I guess we have already discussed this, passing it as part of .export() doesn't make sense, since there are cache module changes required. We can do it as part of .pre_trained().

ochougul · 2025-11-13T07:16:16Z

QEfficient/utils/patches.py

+    if hasattr(onnx_utils, "_get_module_attributes"):
+        onnx_utils._get_module_attributes = _get_module_attributes
+
+    print("Applied torch ONNX export patches for export_modules_as_functions compatibility")


use logger here.

vbaddi added 5 commits November 5, 2025 17:20

fix: Add torch ONNX export patch for export_modules_as_functions

2cb1708

Signed-off-by: vbaddi <[email protected]>

fix: update, fix the modeling_qeff

c16a9eb

Signed-off-by: vbaddi <[email protected]>

nit: add rename_function fix

02eaaa8

Signed-off-by: Vinayak Baddi <[email protected]>

nit: fix dynamic axes integer access

1fce1d6

Signed-off-by: Vinayak Baddi <[email protected]>

vbaddi self-assigned this Nov 9, 2025

vbaddi requested review from ochougul, quic-amitraj, quic-hemagnih and quic-rishinr as code owners November 9, 2025 11:42

nit: ruff/lint on cache utils file

7f1d431

Signed-off-by: Vinayak Baddi <[email protected]>

vbaddi marked this pull request as draft November 9, 2025 11:45

vbaddi changed the title ~~Feat: Add ONNX Sub Functions Export Feature~~ WIP: Feat: Add ONNX Sub Functions Export Feature Nov 9, 2025

quic-rishinr requested changes Nov 10, 2025

View reviewed changes

vbaddi assigned abhishek-singh591 Nov 10, 2025

vbaddi added the enhancement New feature or request label Nov 10, 2025

ochougul reviewed Nov 13, 2025

View reviewed changes

WIP: Feat: Add ONNX Sub Functions Export Feature #613

Are you sure you want to change the base?

WIP: Feat: Add ONNX Sub Functions Export Feature #613

Conversation

vbaddi commented Nov 9, 2025

Changes

Enable ONNX Functions Export

Export and Execute with ONNX Functions

Backward Compatibility

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ochougul commented Nov 11, 2025

Uh oh!

vbaddi commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

vbaddi commented Nov 11, 2025 •

edited

Loading