CUDA: HIP: maintain_cuda_graph use of cudaGraphKernelNodeGetParams is incorrect. #12152

IMbackK · 2025-03-02T21:15:13Z

According to cuda documentation the memory of the cudaKernelNodeParams struct as returned by cudaGraphKernelNodeGetParams is owned by the associated node. In the case here

llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu

Line 2546 in 14dec0c

    
           cudaError_t stat = cudaGraphKernelNodeGetParams(cuda_ctx->cuda_graph->nodes[i], &cuda_ctx->cuda_graph->params[i]); // Get params using runtime

the nodes in question where created using stream capture, ie they are owned by the cuda/hip runtime.

we later modify this struct by replacing one of its pointer members with the address inside a block of memory we own:

llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu

Line 2565 in 14dec0c

cuda_ctx->cuda_graph->params[i].kernelParams[1] = updated_kernel_arg_ptr;

We have thus modified the node by simply replacing a pointer to a member with memory owned by the runtime with a pointer to memory we own. There is no way this results in well defined behavior and indeed the cuda documentation prohibits this action, see the link to the documentation above:

This memory remains valid until the node is destroyed or its parameters are modified, and should not be modified directly.

Presumably this happens to work on cuda right now either because the runtime happens to allocate the pointer we are updating as part of a larger block separately malloced and the pointer happens to not be the first address in the allocated block, or because the runtime simply is leaking this memory.

On hip, the runtime mallocs() memory all the kernel pointers separately and then frees() the memory when the node is destroyed. This of course causses an invalid free() when the runtime encounters the pointer we changed to point to our memory.

We could avoid this by not changing the pointer to memory we own, but to instead simply update the value it holds:

diff --git a/ggml/src/ggml-cuda/ggml-cuda.cu b/ggml/src/ggml-cuda/ggml-cuda.cu
index d23686d1..1825124d 100644
--- a/ggml/src/ggml-cuda/ggml-cuda.cu
+++ b/ggml/src/ggml-cuda/ggml-cuda.cu
@@ -2562,7 +2562,7 @@ static void maintain_cuda_graph(ggml_backend_cuda_context * cuda_ctx, std::vecto
         for (size_t i = 0; i < cuda_ctx->cuda_graph->num_nodes; i++) {
             if(count(ggml_cuda_cpy_fn_ptrs.begin(), ggml_cuda_cpy_fn_ptrs.end(), cuda_ctx->cuda_graph->params[i].func) > 0) {
                 char ** updated_kernel_arg_ptr = cuda_ctx->cuda_graph->updated_kernel_arg.at(k++);
-                cuda_ctx->cuda_graph->params[i].kernelParams[1] = updated_kernel_arg_ptr;
+                *(void**)cuda_ctx->cuda_graph->params[i].kernelParams[1] = *(void**)updated_kernel_arg_ptr;
                 CUDA_CHECK(cudaGraphKernelNodeSetParams(cuda_ctx->cuda_graph->nodes[i], &cuda_ctx->cuda_graph->params[i]));
             }
         }

I have verfied by looking at the hip runtime code and discussion with an amd engeneer that this is fine to do for hip, but this still violates the provision to not modify the parameters given by the cuda documentation and i have no idea if this is safe to do there. The only way to not violate the constraints given by the documentation would be assemble a cudaKernelNodeParams struct by hand from scratch in this possition:

llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu

Line 2564 in 14dec0c

    
           char ** updated_kernel_arg_ptr = cuda_ctx->cuda_graph->updated_kernel_arg.at(k++);

The text was updated successfully, but these errors were encountered:

IMbackK · 2025-03-02T21:17:08Z

pageing @aendk from nv, @slaren and @JohannesGaessler

slaren · 2025-03-02T21:40:09Z

Also tagging @agray3 here

agray3 · 2025-03-03T10:08:44Z

I agree that the above change is better, and confirmed it also works for CUDA. I'll check with our CUDA graphs team to get their comments on it. Note that we can bypass this altogether with #9017 which still needs some work as per comment #9017 (comment) - if there is a desire for this I can resurrect it.

IMbackK · 2025-03-03T10:58:32Z

Maybe you can submit feedback to them that this interface seams poorly thought out. Either:

cudaGraphKernelNodeGetParams should return a const struct
a helper function to make a deep copy of the params should be provided
a helper function to free the copy should be provided

or:

cudaGraphKernelNodeGetParams should return a deep copy
a helper function to free it should be provided

Otherwise this 'here is a non-const pointer to our internal memory structure but please dont modify anything' behavior of cudaGraphKernelNodeGetParams seams like a unnescary footgun.

agray3 · 2025-03-05T09:50:24Z

The CUDA graphs team have confirmed that these solutions which directly modify the parameters are not strictly compliant with the docs. It is understood why they work at the moment, and there is no specific reason to believe anything will change, but of course there can be no guarantees in the future. Therefore I think it would make sense to move away from this via indirection (with the updated parameters copied to the GPU rather than inserted in the graph) as mentioned above. I'll work on this.

IMbackK · 2025-03-05T16:05:36Z

ok i will open a pr with the patch above as a stop gap. Please do consider my feedback on the gaph interface.

agray3 · 2025-03-05T16:19:25Z

ok i will open a pr with the patch above as a stop gap. Please do consider my feedback on the gaph interface.

Sounds good. Yes, I have already also forwarded your feedback and the team are considering it.

…ointers Previously there was complexity in the CUDA graphs implementation due frequently changing parameters to copy kernels associated with K and V cache pointers. This patch simplifies by using indirection to avoid such parameters frequently changing, avoiding the need for frequent graph updates. Fixes ggml-org#12152

zsogitbe · 2025-03-14T10:24:34Z

May I ask a related question? My code occasionally hangs during the execution of update_cuda_graph_executable and repeatedly logs the following message using GGML_LOG_DEBUG: "CUDA graph update failed"

When this occurs, the model does not return and enters an endless loop, continuously logging this message without stopping.

Could you please advise on how to break out of this loop and handle the error more gracefully, such as throwing an error or exiting the loop? I believe this might be a bug, and I am looking for a workaround until it is resolved.

Thank you for your assistance!

zsogitbe · 2025-03-14T11:03:37Z

One more peace of information:

If I switch OFF cuda graph in the compilation of llama.cpp, then I get occasionally this with the same endless loop:

debug,14/03/2025 11:59:49 ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1)
debug,14/03/2025 11:59:49 debug,14/03/2025 11:59:49 ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1)

This and the formerly reported "CUDA graph update failed" are most probably pointing to the same BUG.

…ointers Previously there was complexity in the CUDA graphs implementation due frequently changing parameters to copy kernels associated with K and V cache pointers. This patch simplifies by using indirection to avoid such parameters frequently changing, avoiding the need for frequent graph updates. Fixes ggml-org#12152

IMbackK mentioned this issue Mar 5, 2025

HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. #12209

Merged

Nexesenex mentioned this issue Mar 21, 2025

CUDA: Simplify and improve CUDA graphs through use of indirect copy p… Nexesenex/croco.cpp#333

Merged

slaren closed this as completed in 3f9da22 Apr 3, 2025

zsogitbe mentioned this issue Apr 3, 2025

Dramatic increase of the C++ dlls size SciSharp/LLamaSharp#1125

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: HIP: maintain_cuda_graph use of cudaGraphKernelNodeGetParams is incorrect. #12152

CUDA: HIP: maintain_cuda_graph use of cudaGraphKernelNodeGetParams is incorrect. #12152

IMbackK commented Mar 2, 2025 •

edited

Loading

IMbackK commented Mar 2, 2025

slaren commented Mar 2, 2025

agray3 commented Mar 3, 2025

IMbackK commented Mar 3, 2025 •

edited

Loading

agray3 commented Mar 5, 2025

IMbackK commented Mar 5, 2025

agray3 commented Mar 5, 2025

zsogitbe commented Mar 14, 2025

zsogitbe commented Mar 14, 2025

CUDA: HIP: maintain_cuda_graph use of cudaGraphKernelNodeGetParams is incorrect. #12152

CUDA: HIP: maintain_cuda_graph use of cudaGraphKernelNodeGetParams is incorrect. #12152

Comments

IMbackK commented Mar 2, 2025 • edited Loading

IMbackK commented Mar 2, 2025

slaren commented Mar 2, 2025

agray3 commented Mar 3, 2025

IMbackK commented Mar 3, 2025 • edited Loading

agray3 commented Mar 5, 2025

IMbackK commented Mar 5, 2025

agray3 commented Mar 5, 2025

zsogitbe commented Mar 14, 2025

zsogitbe commented Mar 14, 2025

IMbackK commented Mar 2, 2025 •

edited

Loading

IMbackK commented Mar 3, 2025 •

edited

Loading