Skip to content

Improve CUDA/HIP local argument handling #2298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 3, 2024

Conversation

EwanC
Copy link
Contributor

@EwanC EwanC commented Nov 8, 2024

The current implementation of CUDA/HIP local memory argument handling does not update the offset of any following local memory arguments when a preceding local memory argument is set. Instead the expectation is that clearLocalSize() is called after a kernel command has been appended/enqueued which clears the vector of local memory used by each argument. Then if the local memory arguments must be reset with urKernelSetArgLocal.

This implementation causes problems for command-buffer kernel command update, where a user can pass a subset of local arguments to update, without a guarantee of passing all the local arguments in each update.

In this patch the local argument of CUDA/HIP is refactored so that when a local argument is set, any local arguments which follow are updated to account of changes in size & padding. This removes the need to have a clearLocalSize() method and also for a user to have to set urKernelSetArgLocal before each kernel enqueue/append.

DPC++ PR intel/llvm#16025

@github-actions github-actions bot added cuda CUDA adapter specific issues command-buffer Command Buffer feature addition/changes/specification labels Nov 8, 2024
@EwanC EwanC force-pushed the ewan/cuda_update_local_size branch from 3d64f76 to 977a240 Compare November 8, 2024 14:59
@github-actions github-actions bot added the conformance Conformance test suite issues. label Nov 8, 2024
@EwanC EwanC force-pushed the ewan/cuda_update_local_size branch from 99775d4 to e9ecf06 Compare November 14, 2024 09:43
@EwanC EwanC force-pushed the ewan/cuda_update_local_size branch from e9ecf06 to 6c67530 Compare November 14, 2024 14:17
@github-actions github-actions bot added the hip HIP adapter specific issues label Nov 14, 2024
@EwanC EwanC changed the title WIP: Cuda Fix for command-buffer local argument upate Improve CUDA/HIP local argument handling Nov 14, 2024
@EwanC EwanC force-pushed the ewan/cuda_update_local_size branch from 6c67530 to a13c0d1 Compare November 14, 2024 15:01
@EwanC EwanC marked this pull request as ready for review November 15, 2024 09:20
@EwanC EwanC requested review from a team as code owners November 15, 2024 09:20
@EwanC EwanC requested a review from frasercrmck November 15, 2024 09:20
@EwanC EwanC force-pushed the ewan/cuda_update_local_size branch 2 times, most recently from f5a75b9 to bed340f Compare November 18, 2024 11:47
Copy link
Contributor

@aarongreig aarongreig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CTS LGTM

@github-actions github-actions bot added loader Loader related feature/bug sanitizer Sanitizer layer issues/changes/specification labels Nov 19, 2024
@EwanC EwanC force-pushed the ewan/cuda_update_local_size branch 2 times, most recently from 23c24a4 to 0500f4c Compare November 19, 2024 14:32
@EwanC EwanC force-pushed the ewan/cuda_update_local_size branch from 0500f4c to 3e32dba Compare November 19, 2024 15:20
@github-actions github-actions bot added the specification Changes or additions to the specification label Nov 19, 2024
@EwanC EwanC force-pushed the ewan/cuda_update_local_size branch from 3e32dba to 582f358 Compare November 20, 2024 12:46
Copy link
Contributor

@hdelan hdelan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Would be good to take the urKernelSetExecInfo conversation forward with @oneapi-src/unified-runtime-cuda-write

@EwanC EwanC added ready to merge Added to PR's which are ready to merge v0.11.x Include in the v0.11.x release labels Nov 21, 2024
EwanC and others added 6 commits December 2, 2024 13:22
After setting kernel arguments during update, we
need to reset the amount of local memory used.
Iterate on previous solution so that the local argument
offsets at following inidices are updated when an earlier
local argument is updated
Co-authored-by: Ben Tracy <[email protected]>
Co-authored-by: aarongreig <[email protected]>
@EwanC EwanC force-pushed the ewan/cuda_update_local_size branch from 582f358 to e578228 Compare December 2, 2024 13:23
EwanC added a commit to reble/llvm that referenced this pull request Dec 2, 2024
Tests UR PR oneapi-src/unified-runtime#2298
with additional SYCL-Graph local memory argument E2E tests.

PR also sets the `pnext` and `snext` members of
`ur_exp_command_buffer_update_kernel_launch_desc_t ` which were missing when
calling into UR.
@kbenzie kbenzie merged commit 2bea25d into oneapi-src:main Dec 3, 2024
73 checks passed
sarnex pushed a commit to intel/llvm that referenced this pull request Dec 3, 2024
Tests UR PR oneapi-src/unified-runtime#2298 with
additional SYCL-Graph local memory argument E2E tests.

PR also sets the `pnext` and `snext` members of
`ur_exp_command_buffer_update_kernel_launch_desc_t ` which were missing
when calling into UR.

---------

Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
command-buffer Command Buffer feature addition/changes/specification conformance Conformance test suite issues. cuda CUDA adapter specific issues hip HIP adapter specific issues loader Loader related feature/bug ready to merge Added to PR's which are ready to merge sanitizer Sanitizer layer issues/changes/specification specification Changes or additions to the specification v0.11.x Include in the v0.11.x release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants