Improve CUDA/HIP local argument handling #2298

EwanC · 2024-11-08T10:30:19Z

The current implementation of CUDA/HIP local memory argument handling does not update the offset of any following local memory arguments when a preceding local memory argument is set. Instead the expectation is that clearLocalSize() is called after a kernel command has been appended/enqueued which clears the vector of local memory used by each argument. Then if the local memory arguments must be reset with urKernelSetArgLocal.

This implementation causes problems for command-buffer kernel command update, where a user can pass a subset of local arguments to update, without a guarantee of passing all the local arguments in each update.

In this patch the local argument of CUDA/HIP is refactored so that when a local argument is set, any local arguments which follow are updated to account of changes in size & padding. This removes the need to have a clearLocalSize() method and also for a user to have to set urKernelSetArgLocal before each kernel enqueue/append.

DPC++ PR intel/llvm#16025

source/adapters/cuda/kernel.hpp

aarongreig

CTS LGTM

test/conformance/exp_command_buffer/update/local_memory_update.cpp

source/adapters/cuda/kernel.hpp

source/adapters/hip/kernel.hpp

scripts/core/CUDA.rst

source/adapters/cuda/kernel.hpp

hdelan

LGTM. Would be good to take the urKernelSetExecInfo conversation forward with @oneapi-src/unified-runtime-cuda-write

After setting kernel arguments during update, we need to reset the amount of local memory used.

Iterate on previous solution so that the local argument offsets at following inidices are updated when an earlier local argument is updated

Co-authored-by: Ben Tracy <[email protected]> Co-authored-by: aarongreig <[email protected]>

Tests UR PR oneapi-src/unified-runtime#2298 with additional SYCL-Graph local memory argument E2E tests. PR also sets the `pnext` and `snext` members of `ur_exp_command_buffer_update_kernel_launch_desc_t ` which were missing when calling into UR.

Tests UR PR oneapi-src/unified-runtime#2298 with additional SYCL-Graph local memory argument E2E tests. PR also sets the `pnext` and `snext` members of `ur_exp_command_buffer_update_kernel_launch_desc_t ` which were missing when calling into UR. --------- Co-authored-by: Kenneth Benzie (Benie) <[email protected]>

github-actions bot added cuda CUDA adapter specific issues command-buffer Command Buffer feature addition/changes/specification labels Nov 8, 2024

EwanC mentioned this pull request Nov 8, 2024

[SYCL][Graph] Fix CUDA/HIP local mem argument update bug intel/llvm#16025

Merged

EwanC force-pushed the ewan/cuda_update_local_size branch from 3d64f76 to 977a240 Compare November 8, 2024 14:59

github-actions bot added the conformance Conformance test suite issues. label Nov 8, 2024

EwanC force-pushed the ewan/cuda_update_local_size branch from 99775d4 to e9ecf06 Compare November 14, 2024 09:43

EwanC force-pushed the ewan/cuda_update_local_size branch from e9ecf06 to 6c67530 Compare November 14, 2024 14:17

github-actions bot added the hip HIP adapter specific issues label Nov 14, 2024

EwanC changed the title ~~WIP: Cuda Fix for command-buffer local argument upate~~ Improve CUDA/HIP local argument handling Nov 14, 2024

EwanC force-pushed the ewan/cuda_update_local_size branch from 6c67530 to a13c0d1 Compare November 14, 2024 15:01

EwanC marked this pull request as ready for review November 15, 2024 09:20

EwanC requested review from a team as code owners November 15, 2024 09:20

EwanC requested a review from frasercrmck November 15, 2024 09:20

EwanC force-pushed the ewan/cuda_update_local_size branch 2 times, most recently from f5a75b9 to bed340f Compare November 18, 2024 11:47

hdelan reviewed Nov 18, 2024

View reviewed changes

source/adapters/cuda/kernel.hpp Show resolved Hide resolved

aarongreig approved these changes Nov 18, 2024

View reviewed changes

Bensuo reviewed Nov 18, 2024

View reviewed changes

source/adapters/cuda/kernel.hpp Outdated Show resolved Hide resolved

source/adapters/hip/kernel.hpp Outdated Show resolved Hide resolved

github-actions bot added loader Loader related feature/bug sanitizer Sanitizer layer issues/changes/specification labels Nov 19, 2024

EwanC force-pushed the ewan/cuda_update_local_size branch 2 times, most recently from 23c24a4 to 0500f4c Compare November 19, 2024 14:32

EwanC force-pushed the ewan/cuda_update_local_size branch from 0500f4c to 3e32dba Compare November 19, 2024 15:20

github-actions bot added the specification Changes or additions to the specification label Nov 19, 2024

hdelan reviewed Nov 19, 2024

View reviewed changes

scripts/core/CUDA.rst Outdated Show resolved Hide resolved

hdelan reviewed Nov 19, 2024

View reviewed changes

scripts/core/CUDA.rst Outdated Show resolved Hide resolved

frasercrmck reviewed Nov 19, 2024

View reviewed changes

source/adapters/cuda/kernel.hpp Outdated Show resolved Hide resolved

EwanC force-pushed the ewan/cuda_update_local_size branch from 3e32dba to 582f358 Compare November 20, 2024 12:46

hdelan approved these changes Nov 20, 2024

View reviewed changes

frasercrmck approved these changes Nov 20, 2024

View reviewed changes

EwanC added ready to merge Added to PR's which are ready to merge v0.11.x Include in the v0.11.x release labels Nov 21, 2024

EwanC and others added 6 commits December 2, 2024 13:22

[CUDA][HIP] Fix for command-buffer local argument update

79f3ccf

After setting kernel arguments during update, we need to reset the amount of local memory used.

Improve solution

31dc790

Iterate on previous solution so that the local argument offsets at following inidices are updated when an earlier local argument is updated

Add extra more basic CTS test

0b5bc82

FIx comment typos

a5d2bda

Co-authored-by: Ben Tracy <[email protected]> Co-authored-by: aarongreig <[email protected]>

Add non command-buffer test

7d126f3

Document hip extra arg behavior

e578228

EwanC force-pushed the ewan/cuda_update_local_size branch from 582f358 to e578228 Compare December 2, 2024 13:23

kbenzie merged commit 2bea25d into oneapi-src:main Dec 3, 2024
73 checks passed

Naghasan mentioned this pull request Dec 3, 2024

Add new launch property to support work_group_scratch_memory #2403

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve CUDA/HIP local argument handling #2298

Improve CUDA/HIP local argument handling #2298

Uh oh!

EwanC commented Nov 8, 2024 •

edited

Loading

Uh oh!

Uh oh!

aarongreig left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hdelan left a comment

Uh oh!

Uh oh!

Uh oh!

Improve CUDA/HIP local argument handling #2298

Improve CUDA/HIP local argument handling #2298

Uh oh!

Conversation

EwanC commented Nov 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

aarongreig left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hdelan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

EwanC commented Nov 8, 2024 •

edited

Loading