-
Notifications
You must be signed in to change notification settings - Fork 125
Improve CUDA/HIP local argument handling #2298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
3d64f76
to
977a240
Compare
99775d4
to
e9ecf06
Compare
e9ecf06
to
6c67530
Compare
6c67530
to
a13c0d1
Compare
f5a75b9
to
bed340f
Compare
hdelan
reviewed
Nov 18, 2024
aarongreig
approved these changes
Nov 18, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CTS LGTM
test/conformance/exp_command_buffer/update/local_memory_update.cpp
Outdated
Show resolved
Hide resolved
test/conformance/exp_command_buffer/update/local_memory_update.cpp
Outdated
Show resolved
Hide resolved
test/conformance/exp_command_buffer/update/local_memory_update.cpp
Outdated
Show resolved
Hide resolved
test/conformance/exp_command_buffer/update/local_memory_update.cpp
Outdated
Show resolved
Hide resolved
test/conformance/exp_command_buffer/update/local_memory_update.cpp
Outdated
Show resolved
Hide resolved
Bensuo
reviewed
Nov 18, 2024
23c24a4
to
0500f4c
Compare
0500f4c
to
3e32dba
Compare
hdelan
reviewed
Nov 19, 2024
hdelan
reviewed
Nov 19, 2024
frasercrmck
reviewed
Nov 19, 2024
3e32dba
to
582f358
Compare
hdelan
approved these changes
Nov 20, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Would be good to take the urKernelSetExecInfo
conversation forward with @oneapi-src/unified-runtime-cuda-write
frasercrmck
approved these changes
Nov 20, 2024
After setting kernel arguments during update, we need to reset the amount of local memory used.
Iterate on previous solution so that the local argument offsets at following inidices are updated when an earlier local argument is updated
Co-authored-by: Ben Tracy <[email protected]> Co-authored-by: aarongreig <[email protected]>
582f358
to
e578228
Compare
EwanC
added a commit
to reble/llvm
that referenced
this pull request
Dec 2, 2024
Tests UR PR oneapi-src/unified-runtime#2298 with additional SYCL-Graph local memory argument E2E tests. PR also sets the `pnext` and `snext` members of `ur_exp_command_buffer_update_kernel_launch_desc_t ` which were missing when calling into UR.
sarnex
pushed a commit
to intel/llvm
that referenced
this pull request
Dec 3, 2024
Tests UR PR oneapi-src/unified-runtime#2298 with additional SYCL-Graph local memory argument E2E tests. PR also sets the `pnext` and `snext` members of `ur_exp_command_buffer_update_kernel_launch_desc_t ` which were missing when calling into UR. --------- Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
command-buffer
Command Buffer feature addition/changes/specification
conformance
Conformance test suite issues.
cuda
CUDA adapter specific issues
hip
HIP adapter specific issues
loader
Loader related feature/bug
ready to merge
Added to PR's which are ready to merge
sanitizer
Sanitizer layer issues/changes/specification
specification
Changes or additions to the specification
v0.11.x
Include in the v0.11.x release
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The current implementation of CUDA/HIP local memory argument handling does not update the offset of any following local memory arguments when a preceding local memory argument is set. Instead the expectation is that
clearLocalSize()
is called after a kernel command has been appended/enqueued which clears the vector of local memory used by each argument. Then if the local memory arguments must be reset withurKernelSetArgLocal
.This implementation causes problems for command-buffer kernel command update, where a user can pass a subset of local arguments to update, without a guarantee of passing all the local arguments in each update.
In this patch the local argument of CUDA/HIP is refactored so that when a local argument is set, any local arguments which follow are updated to account of changes in size & padding. This removes the need to have a
clearLocalSize()
method and also for a user to have to seturKernelSetArgLocal
before each kernel enqueue/append.DPC++ PR intel/llvm#16025