[ET-VK] 4/n Split dispatches between multiple command buffers. Removing flush function call from submit_current_cmd_and_wait. #12525

trivedivivek · 2025-07-15T22:22:03Z

Stack from ghstack (oldest at bottom):

-> [ET-VK] 4/n Split dispatches between multiple command buffers. Removing flush function call from submit_current_cmd_and_wait. #12525
[ET-VK] 3/n Split dispatches between multiple command buffers. Track previous semaphore in context. #12523
[ET-VK] 2/n Split dispatches between multiple command buffers. Add semaphore support to Adapter::submit_cmd. #12522
[ET-VK] 1/n Split dispatches between multiple command buffers. Add semaphore support in command buffer. #12519

This diff removes the flush function call from submit_current_cmd_and_wait and instead adds it to the encode_prepack method.

This change will allow better control over flush which is needed in follow up diffs.

Differential Revision: D78360036

## Changes * Update the `gen_vulkan_spv.py` script to account for changes to included files when decided whether to perform a re-compilation ## Additional Changes Also applied some misc fixes: * Allow the Python preprocessor to handle unescaped `\` used to extend macros to multiple lines. Fixed this by replacing these with `\` before running the preprocessor. * Fixed an issue where build would unexpectedly pass when trying to recompile a failing build multiple time. This would cause the caching mechanism to use a old cached build artifact since no changes were detected. Fixed this by removing any existing cached artifacts on unsuccessful build. ## Context The `gen_vulkan_spv.py` script which handles GLSL shader codegen and GLSL -> SPIR-V compilation for the ExecuTorch Vulkan backend has a caching mechanism to only recompile modified files for the purpose of developer efficiency. However, this mechanism currently doesn't consider whether included files have been changed. Therefore, if a include file is modified without first modifying the source file which uses it, the changes to the include file will not be captured. Differential Revision: [D78275585](https://our.internmc.facebook.com/intern/diff/D78275585/) [ghstack-poisoned]

## Changes * Introduce `run_prepack()` API which combines the functionality of `encode_prepack()` and `prepack()`, but submits prepacking shaders incrementally rather than all at once. * Introduce graph config options to control command buffer submission behaviour during prepacking. Note that the current default values for the prepack submission thresholds were determined through experimentation. I will leave determining optimal values for specific devices as a later exercise. The goal of this diff is simply to introduce this mechanism to fix the Llama model loading crash on Samsung S24 (described below). ## Context Currently, ET-VK will encode all prepacking shaders, and then perform prepacking by submitting only one command buffer. However, this approach has some drawbacks: * CPU/GPU parallelism is decreased, since the command buffer is submitted only after all commands have been encoded. * There can be performance issues at the Vulkan API level when processing a single "large" command buffer. By splitting up prepacking to occur over multiple command buffers, performance can be improved by avoiding both the aforementioned issues. ## Llama 3.2 1B crash on Samsung S24 I have also noticed that running large models (i.e. Llama 3.2 1B) on the Samsung S24 with ET-VK, the device's display will crash (causing the screen to go black and become unresponsive), and sometimes the device will shut down entirely. Fortunately, this change also fixes this behaviour, in addition to providing a significant performance boost to model load time for Llama models (from 9s to 3s). ## Performance Impact * Improves model load time, especially on larger models. ## Future Work * Deprecate the `encode_prepack()` + `prepack()` pattern in favor of the `run_prepack()` pattern Differential Revision: [D78275586](https://our.internmc.facebook.com/intern/diff/D78275586/) [ghstack-poisoned]

…maphore support in command buffer. execute stage ## Context This following diffs aims to improve the performance of the Executorch Vulkan backend by adding a mechanism to issue multiple command buffers in prepack and execute function, so GPU work is issues while CPU is still working on issuing new work. ## This Diff ### Summary This diff is the first in a series of diffs that aim to split dispatches between multiple command buffers and add semaphore support in the command buffer. The changes in this diff include: * Adding a `VkSemaphore` parameter to the `CommandBuffer` constructor in `vk_api/Command.cpp` and `vk_api/Command.h` to support signaling when the command buffer has completed execution. * Modifying the `CommandBuffer` constructor in `vk_api/Command.h` and `vk_api/Command.cpp` to include the `VkSemaphore` parameter. * Updating the `CommandBuffer` object in `api/Context.cpp` to include the `VkSemaphore` parameter. Differential Revision: [D78282194](https://our.internmc.facebook.com/intern/diff/D78282194/) [ghstack-poisoned]

…maphore support to Adapter::submit_cmd. This diff adds semaphore support to `Adapter::submit_cmd` by modifying its function signature to include `wait_semaphore` and `signal_semaphore` parameters. The updated function now takes into account the new semaphore parameters and correctly configures the pipeline stages using `VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT`. Differential Revision: [D78360041](https://our.internmc.facebook.com/intern/diff/D78360041/) [ghstack-poisoned]

…ers. Add semaphore support to Adapter::submit_cmd." This diff adds semaphore support to `Adapter::submit_cmd` by modifying its function signature to include `wait_semaphore` and `signal_semaphore` parameters. The updated function now takes into account the new semaphore parameters and correctly configures the pipeline stages using `VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT`. Differential Revision: [D78360041](https://our.internmc.facebook.com/intern/diff/D78360041/) [ghstack-poisoned]

…previous semaphore in context. This diff is the third part of a series of diffs aiming to split dispatches between multiple command buffers. In this diff, we are tracking the previous semaphore in the context. A new member variable `prev_semaphore_` was added to the `Context` class. This variable is used to store the semaphore of the previously submitted command buffer. Differential Revision: [D78360037](https://our.internmc.facebook.com/intern/diff/D78360037/) [ghstack-poisoned]

…ng flush function call from submit_current_cmd_and_wait. This diff removes the `flush` function call from `submit_current_cmd_and_wait` and instead adds it to the `encode_prepack` method. This change will allow better control over flush which is needed in follow up diffs. Differential Revision: [D78360036](https://our.internmc.facebook.com/intern/diff/D78360036/) [ghstack-poisoned]

pytorch-bot · 2025-07-15T22:22:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12525

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Unrelated Failure

As of commit fce21fe with merge base 0012ffa ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner / linux-job (gh)
>>> Lint for backends/nxp/nxp_backend.py:
pull / unittest-nxp-neutron / linux-job (gh)
RuntimeError: Command docker exec -t ebffe8aa13a04de0fce2f96ac11d8a0f165e56877703267e4e9d7649db36de36 /exec failed with exit code 1

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / test-models-linux-basic (mv3, xnnpack-quantization-delegation, cmake, linux.2xlarge, executorch-u... / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…ng flush function call from submit_current_cmd_and_wait. This diff removes the `flush` function call from `submit_current_cmd_and_wait` and instead adds it to the `encode_prepack` method. This change will allow better control over flush which is needed in follow up diffs. Differential Revision: [D78360036](https://our.internmc.facebook.com/intern/diff/D78360036/) ghstack-source-id: 296431501 Pull Request resolved: #12525

facebook-github-bot · 2025-07-15T22:22:15Z