Aoti cuda export support #14438

Gasoonjia · 2025-09-19T16:33:49Z

This PR introduces the export support for cuda delegate using aoti library. Also create ci test for verification.

pytorch-bot · 2025-09-19T16:33:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14438

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures, 6 Cancelled Jobs, 2 Unrelated Failures

As of commit 679b0e0 with merge base a548635 ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner / linux-job (gh)
pull / test-llava-runner-linux / linux-job (gh)
test_llava_export
pull / test-samsung-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t 0f7e3f1aedd9427db0bfd26b2fdf07e380684cd9780cc196c91372eba2538e64 /exec failed with exit code 1
Test CUDA AOTI Export / check-all-cuda-aoti-exports (gh)
ERROR: One or more ExecutorTorch CUDA AOTI export tests failed!
Test CUDA Builds / check-all-cuda-builds (gh)
Process completed with exit code 1.
trunk / test-arm-ootb-linux / linux-job (gh)
RuntimeError: Command docker exec -t 1d9325ef965ea75165fec3a2ec854fd7d056df75a4076de5ccb53ccbdba2494a /exec failed with exit code 1

CANCELLED JOBS - The following jobs were cancelled. Please retry:

Test CUDA AOTI Export / test-executorch-cuda-aoti-export-12.6 / linux-job (gh)
##[error]The operation was canceled.
Test CUDA AOTI Export / test-executorch-cuda-aoti-export-12.8 / linux-job (gh)
##[error]The operation was canceled.
Test CUDA AOTI Export / test-executorch-cuda-aoti-export-12.9 / linux-job (gh)
Test CUDA Builds / test-executorch-cuda-build-12.6 / linux-job (gh)
##[error]The operation was canceled.
Test CUDA Builds / test-executorch-cuda-build-12.8 / linux-job (gh)
Test CUDA Builds / test-executorch-cuda-build-12.9 / linux-job (gh)
##[error]The operation was canceled.

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

trunk / unittest-release / linux / linux-job (gh) (trunk failure)
[ FAILED ] LoggingTest.Utf8Truncation
trunk / unittest-release / macos / macos-job (gh) (trunk failure)
[ FAILED ] LoggingTest.Utf8Truncation

This comment was automatically generated by Dr. CI and updates every 15 minutes.

larryliu0820 · 2025-09-19T17:10:41Z

.ci/scripts/test_cuda_export_aoti.py

+            print("STDOUT:")
+            print(result.stdout)


Please use logging it provides different levels of logging

JacobSzwejbka · 2025-09-19T18:00:51Z

backends/cuda/cuda_partitioner.py

+
+
+@final
+class CudaPartitioner(Partitioner):


so this is basically just skeleton code? We could skip having an initial partitioner implementation entirely I think and just only allow the other to_backend api for now? Is that how it works?

We should keep it; we need this partitioner to skip et operator decomposition for all ops.

JacobSzwejbka · 2025-09-19T18:01:35Z

exir/backend/backend_api.py

            fake_edge_program = copy.deepcopy(edge_program)
        partitioner_result = partitioner_instance(fake_edge_program)
        tagged_exported_program = partitioner_result.tagged_exported_program
+        tagged_exported_program.example_inputs = edge_program.example_inputs


Are we serializing the example inputs?

serializing here you mean exir.save, or to pte?

JacobSzwejbka · 2025-09-19T18:12:20Z

.ci/scripts/test_cuda_export_aoti.py

+                if os.path.isfile(file):
+                    os.remove(file)
+                    print(f"Removed file: {file}")
+        except Exception as e:


these should be fatal exceptions right? The test should fail?

JacobSzwejbka · 2025-09-19T18:13:48Z

.gitignore

+*kernel_metadata.json
+*kernel.cpp
+*wrapper_metadata.json
+*wrapper.cpp


does wrapper .cpp stick around? AOTI compilation doesnt clean it up after generating the .so and .cubin?

right now it will generating lots of files, including cubin, wrapper.cpp,etc, and will be auto cleaned up.

JacobSzwejbka · 2025-09-19T18:15:03Z

backends/cuda/cuda_backend.py

+
+
+# exist fallback operators in et namespace;
+supported_fallback_kernels: Dict[str, Any] = {}


@larryliu0820 I feel like I keep hearing conflicting information. Is AOTI falling back to ET or is it a graph break. Graph break sounds more natural in the ET ecosystem to me

Here we will leverage AOTI fallback and don't break graph, for working on missing operators, which is more natural for aoti and convient for us.

Graph break sounds more natural in the ET ecosystem to me

Yes I'm thinking we want to have some cuda kernels and let it graph break. We can also reuse that kernel in AOTI fallback.

JacobSzwejbka · 2025-09-19T18:16:33Z

backends/cuda/cuda_backend.py

+
+        output_path = os.path.join(os.getcwd(), "aoti.so")
+
+        options: dict[str, typing.Any] = {


Can you add some documentation on what these options are or where they are defined?

debug_compile
embed_kernel_binary

are the two non obvious to me

JacobSzwejbka · 2025-09-19T18:16:49Z

backends/cuda/cuda_backend.py

+            "aot_inductor.output_path": output_path,
+            "aot_inductor.debug_compile": True,
+            "aot_inductor.force_mmap_weights": False,
+            "max_autotune": True,


How are we autotuning? We dont know what gpu we are running on?

Maybe @yushangdi can answer this question better. My understanding is it will autotune for the GPU we have during aoti compile. it can get the info automatically.

yeah that's correct.

JacobSzwejbka · 2025-09-19T18:17:15Z

backends/cuda/cuda_backend.py

+        with open(so_path, "rb") as f:
+            so_data = f.read()
+
+        named_data_store.add_named_data("so_blob", so_data, 1, "aoti_cuda_blob")


where do you put the cubin?

no cubin is explicitly needed. All stuffs has been in .so

yeah "aot_inductor.embed_kernel_binary": True, puts the kernel in .so

Gasoonjia · 2025-09-19T21:44:27Z

exir/lowered_backend_module.py

        owning_program, submodule, call_module_node, tag, is_submodule
    )

    in_spec = pytree.tree_flatten((tuple(subgraph_signature.user_inputs), {}))[1]


todo: add a check for examing the input signature of first partition and original edge program

larryliu0820 · 2025-09-19T21:49:12Z

backends/cuda/cuda_backend.py

@@ -0,0 +1,116 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.


Let's add some unit tests under backends/cuda/test

larryliu0820 · 2025-09-19T21:50:00Z

exir/lowered_backend_module.py

+    submodule_exmaple_inputs = (
+        owning_program.example_inputs if is_first_partition else None
+    )


This is not enough, we need to make sure the signature of the first partition is the same as the signature of the original exported program.

Gasoonjia requested review from JacobSzwejbka and larryliu0820 as code owners September 19, 2025 16:33

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 19, 2025

Gasoonjia added 10 commits September 19, 2025 09:50

rebase to latest main

8f9fc9a

add github ci for gpu pt install check

dbe31b5

add github ci for gpu pt install check

1110434

recover torchao

0621550

solve lint issue

3ef491b

create install_utils.py for better structure

9792c99

set use-custom-docker-registry as false

a18cd15

rebase to latest main

5b430f4

recover torchao

95c2536

solve platform import issue

b00bc14

larryliu0820 reviewed Sep 19, 2025

View reviewed changes

Gasoonjia force-pushed the install-cuda-pt branch from 5d521f3 to b00bc14 Compare September 19, 2025 17:52

introduce missed sys

ae52b29

JacobSzwejbka reviewed Sep 19, 2025

View reviewed changes

introduce missed platform

57ebb63

JacobSzwejbka reviewed Sep 19, 2025

View reviewed changes

Gasoonjia added 4 commits September 19, 2025 12:00

update cuda ci script

43d164f

try ci with specific docker-image

d892e3f

Support cuda export via aoti on ExecuTorch

3308df5

remove temp.patch

4fb4743

Gasoonjia added 3 commits September 19, 2025 13:06

update gitignore

d95b9c3

Support cuda export via aoti on ExecuTorch

d166a42

remove temp.patch

679b0e0

Gasoonjia force-pushed the aoti-cuda-export branch from 7ead0cd to 679b0e0 Compare September 19, 2025 20:36

Gasoonjia commented Sep 19, 2025

View reviewed changes

larryliu0820 reviewed Sep 19, 2025

View reviewed changes

Base automatically changed from install-cuda-pt to main September 20, 2025 06:22



		# exist fallback operators in et namespace;
		supported_fallback_kernels: Dict[str, Any] = {}


		output_path = os.path.join(os.getcwd(), "aoti.so")

		options: dict[str, typing.Any] = {

		@@ -0,0 +1,116 @@
		# Copyright (c) Meta Platforms, Inc. and affiliates.



		@final
		class CudaPartitioner(Partitioner):

Aoti cuda export support #14438

Are you sure you want to change the base?

Aoti cuda export support #14438

Uh oh!

Conversation

Gasoonjia commented Sep 19, 2025

Uh oh!

pytorch-bot bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14438

❌ 6 New Failures, 6 Cancelled Jobs, 2 Unrelated Failures

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JacobSzwejbka Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 19, 2025 •

edited

Loading

JacobSzwejbka Sep 19, 2025 •

edited

Loading