-
Notifications
You must be signed in to change notification settings - Fork 619
[ET-VK] 4/n Split dispatches between multiple command buffers. Removing flush function call from submit_current_cmd_and_wait. #12525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
## Changes * Update the `gen_vulkan_spv.py` script to account for changes to included files when decided whether to perform a re-compilation ## Additional Changes Also applied some misc fixes: * Allow the Python preprocessor to handle unescaped `\` used to extend macros to multiple lines. Fixed this by replacing these with `\` before running the preprocessor. * Fixed an issue where build would unexpectedly pass when trying to recompile a failing build multiple time. This would cause the caching mechanism to use a old cached build artifact since no changes were detected. Fixed this by removing any existing cached artifacts on unsuccessful build. ## Context The `gen_vulkan_spv.py` script which handles GLSL shader codegen and GLSL -> SPIR-V compilation for the ExecuTorch Vulkan backend has a caching mechanism to only recompile modified files for the purpose of developer efficiency. However, this mechanism currently doesn't consider whether included files have been changed. Therefore, if a include file is modified without first modifying the source file which uses it, the changes to the include file will not be captured. Differential Revision: [D78275585](https://our.internmc.facebook.com/intern/diff/D78275585/) [ghstack-poisoned]
## Changes * Introduce `run_prepack()` API which combines the functionality of `encode_prepack()` and `prepack()`, but submits prepacking shaders incrementally rather than all at once. * Introduce graph config options to control command buffer submission behaviour during prepacking. Note that the current default values for the prepack submission thresholds were determined through experimentation. I will leave determining optimal values for specific devices as a later exercise. The goal of this diff is simply to introduce this mechanism to fix the Llama model loading crash on Samsung S24 (described below). ## Context Currently, ET-VK will encode all prepacking shaders, and then perform prepacking by submitting only one command buffer. However, this approach has some drawbacks: * CPU/GPU parallelism is decreased, since the command buffer is submitted only after all commands have been encoded. * There can be performance issues at the Vulkan API level when processing a single "large" command buffer. By splitting up prepacking to occur over multiple command buffers, performance can be improved by avoiding both the aforementioned issues. ## Llama 3.2 1B crash on Samsung S24 I have also noticed that running large models (i.e. Llama 3.2 1B) on the Samsung S24 with ET-VK, the device's display will crash (causing the screen to go black and become unresponsive), and sometimes the device will shut down entirely. Fortunately, this change also fixes this behaviour, in addition to providing a significant performance boost to model load time for Llama models (from 9s to 3s). ## Performance Impact * Improves model load time, especially on larger models. ## Future Work * Deprecate the `encode_prepack()` + `prepack()` pattern in favor of the `run_prepack()` pattern Differential Revision: [D78275586](https://our.internmc.facebook.com/intern/diff/D78275586/) [ghstack-poisoned]
## Changes * Introduce `run_prepack()` API which combines the functionality of `encode_prepack()` and `prepack()`, but submits prepacking shaders incrementally rather than all at once. * Introduce graph config options to control command buffer submission behaviour during prepacking. Note that the current default values for the prepack submission thresholds were determined through experimentation. I will leave determining optimal values for specific devices as a later exercise. The goal of this diff is simply to introduce this mechanism to fix the Llama model loading crash on Samsung S24 (described below). ## Context Currently, ET-VK will encode all prepacking shaders, and then perform prepacking by submitting only one command buffer. However, this approach has some drawbacks: * CPU/GPU parallelism is decreased, since the command buffer is submitted only after all commands have been encoded. * There can be performance issues at the Vulkan API level when processing a single "large" command buffer. By splitting up prepacking to occur over multiple command buffers, performance can be improved by avoiding both the aforementioned issues. ## Llama 3.2 1B crash on Samsung S24 I have also noticed that running large models (i.e. Llama 3.2 1B) on the Samsung S24 with ET-VK, the device's display will crash (causing the screen to go black and become unresponsive), and sometimes the device will shut down entirely. Fortunately, this change also fixes this behaviour, in addition to providing a significant performance boost to model load time for Llama models (from 9s to 3s). ## Performance Impact * Improves model load time, especially on larger models. ## Future Work * Deprecate the `encode_prepack()` + `prepack()` pattern in favor of the `run_prepack()` pattern Differential Revision: [D78275586](https://our.internmc.facebook.com/intern/diff/D78275586/) [ghstack-poisoned]
## Changes * Introduce `run_prepack()` API which combines the functionality of `encode_prepack()` and `prepack()`, but submits prepacking shaders incrementally rather than all at once. * Introduce graph config options to control command buffer submission behaviour during prepacking. Note that the current default values for the prepack submission thresholds were determined through experimentation. I will leave determining optimal values for specific devices as a later exercise. The goal of this diff is simply to introduce this mechanism to fix the Llama model loading crash on Samsung S24 (described below). ## Context Currently, ET-VK will encode all prepacking shaders, and then perform prepacking by submitting only one command buffer. However, this approach has some drawbacks: * CPU/GPU parallelism is decreased, since the command buffer is submitted only after all commands have been encoded. * There can be performance issues at the Vulkan API level when processing a single "large" command buffer. By splitting up prepacking to occur over multiple command buffers, performance can be improved by avoiding both the aforementioned issues. ## Llama 3.2 1B crash on Samsung S24 I have also noticed that running large models (i.e. Llama 3.2 1B) on the Samsung S24 with ET-VK, the device's display will crash (causing the screen to go black and become unresponsive), and sometimes the device will shut down entirely. Fortunately, this change also fixes this behaviour, in addition to providing a significant performance boost to model load time for Llama models (from 9s to 3s). ## Performance Impact * Improves model load time, especially on larger models. ## Future Work * Deprecate the `encode_prepack()` + `prepack()` pattern in favor of the `run_prepack()` pattern Differential Revision: [D78275586](https://our.internmc.facebook.com/intern/diff/D78275586/) [ghstack-poisoned]
…maphore support in command buffer. execute stage ## Context This following diffs aims to improve the performance of the Executorch Vulkan backend by adding a mechanism to issue multiple command buffers in prepack and execute function, so GPU work is issues while CPU is still working on issuing new work. ## This Diff ### Summary This diff is the first in a series of diffs that aim to split dispatches between multiple command buffers and add semaphore support in the command buffer. The changes in this diff include: * Adding a `VkSemaphore` parameter to the `CommandBuffer` constructor in `vk_api/Command.cpp` and `vk_api/Command.h` to support signaling when the command buffer has completed execution. * Modifying the `CommandBuffer` constructor in `vk_api/Command.h` and `vk_api/Command.cpp` to include the `VkSemaphore` parameter. * Updating the `CommandBuffer` object in `api/Context.cpp` to include the `VkSemaphore` parameter. Differential Revision: [D78282194](https://our.internmc.facebook.com/intern/diff/D78282194/) [ghstack-poisoned]
…maphore support to Adapter::submit_cmd. This diff adds semaphore support to `Adapter::submit_cmd` by modifying its function signature to include `wait_semaphore` and `signal_semaphore` parameters. The updated function now takes into account the new semaphore parameters and correctly configures the pipeline stages using `VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT`. Differential Revision: [D78360041](https://our.internmc.facebook.com/intern/diff/D78360041/) [ghstack-poisoned]
…ers. Add semaphore support to Adapter::submit_cmd." This diff adds semaphore support to `Adapter::submit_cmd` by modifying its function signature to include `wait_semaphore` and `signal_semaphore` parameters. The updated function now takes into account the new semaphore parameters and correctly configures the pipeline stages using `VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT`. Differential Revision: [D78360041](https://our.internmc.facebook.com/intern/diff/D78360041/) [ghstack-poisoned]
…previous semaphore in context. This diff is the third part of a series of diffs aiming to split dispatches between multiple command buffers. In this diff, we are tracking the previous semaphore in the context. A new member variable `prev_semaphore_` was added to the `Context` class. This variable is used to store the semaphore of the previously submitted command buffer. Differential Revision: [D78360037](https://our.internmc.facebook.com/intern/diff/D78360037/) [ghstack-poisoned]
…ng flush function call from submit_current_cmd_and_wait. This diff removes the `flush` function call from `submit_current_cmd_and_wait` and instead adds it to the `encode_prepack` method. This change will allow better control over flush which is needed in follow up diffs. Differential Revision: [D78360036](https://our.internmc.facebook.com/intern/diff/D78360036/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12525
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 1 Unrelated FailureAs of commit fce21fe with merge base 0012ffa ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…ng flush function call from submit_current_cmd_and_wait. This diff removes the `flush` function call from `submit_current_cmd_and_wait` and instead adds it to the `encode_prepack` method. This change will allow better control over flush which is needed in follow up diffs. Differential Revision: [D78360036](https://our.internmc.facebook.com/intern/diff/D78360036/) ghstack-source-id: 296431501 Pull Request resolved: #12525
This pull request was exported from Phabricator. Differential Revision: D78360036 |
…ple command buffers. Removing flush function call from submit_current_cmd_and_wait." This diff removes the `flush` function call from `submit_current_cmd_and_wait` and instead adds it to the `encode_prepack` method. This change will allow better control over flush which is needed in follow up diffs. Differential Revision: [D78360036](https://our.internmc.facebook.com/intern/diff/D78360036/) [ghstack-poisoned]
…ers. Removing flush function call from submit_current_cmd_and_wait." This diff removes the `flush` function call from `submit_current_cmd_and_wait` and instead adds it to the `encode_prepack` method. This change will allow better control over flush which is needed in follow up diffs. Differential Revision: [D78360036](https://our.internmc.facebook.com/intern/diff/D78360036/) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D78360036 |
…ple command buffers. Removing flush function call from submit_current_cmd_and_wait." This diff removes the `flush` function call from `submit_current_cmd_and_wait` and instead adds it to the `encode_prepack` method. This change will allow better control over flush which is needed in follow up diffs. Differential Revision: [D78360036](https://our.internmc.facebook.com/intern/diff/D78360036/) [ghstack-poisoned]
…ers. Removing flush function call from submit_current_cmd_and_wait." This diff removes the `flush` function call from `submit_current_cmd_and_wait` and instead adds it to the `encode_prepack` method. This change will allow better control over flush which is needed in follow up diffs. Differential Revision: [D78360036](https://our.internmc.facebook.com/intern/diff/D78360036/) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D78360036 |
…ple command buffers. Removing flush function call from submit_current_cmd_and_wait." This diff removes the `flush` function call from `submit_current_cmd_and_wait` and instead adds it to the `encode_prepack` method. This change will allow better control over flush which is needed in follow up diffs. Differential Revision: [D78360036](https://our.internmc.facebook.com/intern/diff/D78360036/) [ghstack-poisoned]
…ers. Removing flush function call from submit_current_cmd_and_wait." This diff removes the `flush` function call from `submit_current_cmd_and_wait` and instead adds it to the `encode_prepack` method. This change will allow better control over flush which is needed in follow up diffs. Differential Revision: [D78360036](https://our.internmc.facebook.com/intern/diff/D78360036/) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D78360036 |
…ple command buffers. Removing flush function call from submit_current_cmd_and_wait." This diff removes the `flush` function call from `submit_current_cmd_and_wait` and instead adds it to the `encode_prepack` method. This change will allow better control over flush which is needed in follow up diffs. Differential Revision: [D78360036](https://our.internmc.facebook.com/intern/diff/D78360036/) [ghstack-poisoned]
…ers. Removing flush function call from submit_current_cmd_and_wait." This diff removes the `flush` function call from `submit_current_cmd_and_wait` and instead adds it to the `encode_prepack` method. This change will allow better control over flush which is needed in follow up diffs. Differential Revision: [D78360036](https://our.internmc.facebook.com/intern/diff/D78360036/) [ghstack-poisoned]
…ng flush function call from submit_current_cmd_and_wait. Pull Request resolved: #12525 This diff removes the `flush` function call from `submit_current_cmd_and_wait` and instead adds it to the `encode_prepack` method. This change will allow better control over flush which is needed in follow up diffs. ghstack-source-id: 296667692 @exported-using-ghexport Differential Revision: [D78360036](https://our.internmc.facebook.com/intern/diff/D78360036/)
This pull request was exported from Phabricator. Differential Revision: D78360036 |
…ple command buffers. Removing flush function call from submit_current_cmd_and_wait." This diff removes the `flush` function call from `submit_current_cmd_and_wait` and instead adds it to the `encode_prepack` method. This change will allow better control over flush which is needed in follow up diffs. Differential Revision: [D78360036](https://our.internmc.facebook.com/intern/diff/D78360036/) [ghstack-poisoned]
…ers. Removing flush function call from submit_current_cmd_and_wait." This diff removes the `flush` function call from `submit_current_cmd_and_wait` and instead adds it to the `encode_prepack` method. This change will allow better control over flush which is needed in follow up diffs. Differential Revision: [D78360036](https://our.internmc.facebook.com/intern/diff/D78360036/) [ghstack-poisoned]
…ng flush function call from submit_current_cmd_and_wait. Pull Request resolved: #12525 This diff removes the `flush` function call from `submit_current_cmd_and_wait` and instead adds it to the `encode_prepack` method. This change will allow better control over flush which is needed in follow up diffs. ghstack-source-id: 296712817 @exported-using-ghexport Differential Revision: [D78360036](https://our.internmc.facebook.com/intern/diff/D78360036/)
This pull request was exported from Phabricator. Differential Revision: D78360036 |
…maphore support in command buffer. (#12519) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #12525 * #12523 * #12522 * __->__ #12519 execute stage ## Context This following diffs aims to improve the performance of the Executorch Vulkan backend by adding a mechanism to issue multiple command buffers in prepack and execute function, so GPU work is issues while CPU is still working on issuing new work. ## This Diff ### Summary This diff is the first in a series of diffs that aim to split dispatches between multiple command buffers and add semaphore support in the command buffer. The changes in this diff include: * Adding a `VkSemaphore` parameter to the `CommandBuffer` constructor in `vk_api/Command.cpp` and `vk_api/Command.h` to support signaling when the command buffer has completed execution. * Modifying the `CommandBuffer` constructor in `vk_api/Command.h` and `vk_api/Command.cpp` to include the `VkSemaphore` parameter. * Updating the `CommandBuffer` object in `api/Context.cpp` to include the `VkSemaphore` parameter. Differential Revision: [D78282194](https://our.internmc.facebook.com/intern/diff/D78282194/) --------- Co-authored-by: Stephen Jia <[email protected]>
…maphore support to Adapter::submit_cmd. (#12522) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #12525 * #12523 * __->__ #12522 * #12519 This diff adds semaphore support to `Adapter::submit_cmd` by modifying its function signature to include `wait_semaphore` and `signal_semaphore` parameters. The updated function now takes into account the new semaphore parameters and correctly configures the pipeline stages using `VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT`. Differential Revision: [D78360041](https://our.internmc.facebook.com/intern/diff/D78360041/) --------- Co-authored-by: Stephen Jia <[email protected]>
…previous semaphore in context. (#12523) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #12525 * __->__ #12523 * #12522 * #12519 This diff is the third part of a series of diffs aiming to split dispatches between multiple command buffers. In this diff, we are tracking the previous semaphore in the context. A new member variable `prev_semaphore_` was added to the `Context` class. This variable is used to store the semaphore of the previously submitted command buffer. Differential Revision: [D78360037](https://our.internmc.facebook.com/intern/diff/D78360037/) --------- Co-authored-by: Stephen Jia <[email protected]>
Stack from ghstack (oldest at bottom):
This diff removes the
flush
function call fromsubmit_current_cmd_and_wait
and instead adds it to theencode_prepack
method.This change will allow better control over flush which is needed in follow up diffs.
Differential Revision: D78360036