Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capturing Stream Safety #1240

Closed
FreddieWitherden opened this issue Apr 12, 2022 · 9 comments
Closed

Capturing Stream Safety #1240

FreddieWitherden opened this issue Apr 12, 2022 · 9 comments
Assignees

Comments

@FreddieWitherden
Copy link

Is the current version of rocBLAS safe to use in the context of a stream which is capturing? For example:

hipStream_t s1, s2, s3;
hipStreamCreate(&s1); 
hipStreamCreate(&s2);

hipGraph_t g;
hipStreamBeginCapture(s1, 0);
rocblas_set_stream(handle, s1);
rocblas_gemm(...);
hipStreamEndCapture(s1, &g);

hipExecGraph_t eg;
hipGraphInstantiate(&eg, g, 0, 0, 0);
hipGraphLaunch(eg, s2);

The issue surrounds if any kernels in rocBLAS feel like using scratch space. For this to be safe rocBLAS needs to detect if a stream is capturing and, if so, allocate fresh storage (which is never reused or deallocated). This is because a graph can be launched in the context of any stream(s).

@daineAMD
Copy link
Contributor

Hi @FreddieWitherden

Thanks for bringing up this issue, we are looking into it now and will get back to you as soon as we can with some answers.

Thanks again,
Daine

@amcamd
Copy link
Contributor

amcamd commented Apr 27, 2022

Hello @FreddieWitherden,

rocBLAS functions are not safe to use with HIP Graph functions. We will work towards making them Graph safe in future releases of rocBLAS.

@FreddieWitherden
Copy link
Author

Hello @FreddieWitherden,

rocBLAS functions are not safe to use with HIP Graph functions. We will work towards making them Graph safe in future releases of rocBLAS.

Thank you for this. My understanding is that the means of making them graph safe is whenever a function (such as SGEMM) is called which wants to use temporary storage the code should first call hipStreamIsCapturing and, if it returns true, allocate some fresh scratch space (which is only freed when the BLAS context itself is torn down).

@amcamd
Copy link
Contributor

amcamd commented Apr 27, 2022

AFAIK it requires creating a pool of memory associated with the graph. Nodes in the graph asynchronously allocate from the pool, after the allocation is successful kernels are launched asynchronously, after the kernels have completed memory allocated by the node from the pool is asynchronously freed. There needs to be sufficient memory in the pool to allow progress. The order of the asynchronous operations is controlled by the graph.

@FreddieWitherden
Copy link
Author

The approach outlined above is somewhat simpler and takes advantage of the fact that only one instance of a captured graph can be meaningfully run at once. Thus, when one detects a stream is capturing it is sufficient to simply allocate up fresh temporary storage (which is only ever used for that particular kernel invocation and never reused). Although a little bit wasteful it avoids any specific interaction with the graph, the need for the graph to be able to allocate/deallocate memory (which I do not think is currently possible in HIP), and any overhead associated with this.

I believe this is the approach taken by cuBLAS to ensure graph safety.

@sohaibnd
Copy link

sohaibnd commented Jan 14, 2025

Hi @FreddieWitherden, HIP Graph support was added as a beta feature for rocBLAS Level 1, Level 2, and Level 3 (pointer mode host) functions in ROCm 5.5.0 so you should be able to use those rocBLAS functions with hipStreamBeginCapture now (see the docs for more information). Let me know if you have any follow-up questions otherwise I can close this issue.

@sohaibnd
Copy link

@FreddieWitherden Closing this issue as resolved.

@FreddieWitherden
Copy link
Author

Hi @FreddieWitherden, HIP Graph support was added as a beta feature for rocBLAS Level 1, Level 2, and Level 3 (pointer mode host) functions in ROCm 5.5.0 so you should be able to use those rocBLAS functions with hipStreamBeginCapture now (see the docs for more information). Let me know if you have any follow-up questions otherwise I can close this issue.

Unsure if it is related to the capturing but we still observe issues with our code when using the graph API; see:

PyFR/PyFR#312

The code itself is almost function-for-functional identical to what we do on CUDA (and that has been in production for 2-3 years with no reported issues) whereas for HIP when we transitions to graphs we get invalid results on even our simple test cases.

@rkamd
Copy link
Contributor

rkamd commented Feb 6, 2025

@FreddieWitherden,
In order for the stream capture to work as expected, internal workspace memory needs to use stream order allocation (hipMallocAsync() and hipFreeAsync()). Could you re-run the failing code for HIP by setting this env variable ROCBLAS_STREAM_ORDER_ALLOC and let me know the results.

More information here : https://rocm.docs.amd.com/projects/rocBLAS/en/latest/reference/memory-alloc.html#stream-order-alloc
and
https://rocm.docs.amd.com/projects/rocBLAS/en/latest/reference/beta-features.html#graph-support-for-rocblas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants