-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for hipBLAS #92
Comments
Since @gsitaraman mentioned here the potential performance benefit of using rocBLAS, I would still hesitate to switch to hipBLAS, unless you believe it's absolutely necessary. |
I agree that you should keep rocBLAS as long as there are performance benefits to it. However, maybe you could still add support for hipBLAS alongside? It would allow me to make progress with the HIP integration testing in CP2K. |
This should be possible. @AdhocMan, do you think this would be a simple change, or would it require adding a lot of boilerplate code to the gpu-backend? |
hipBLAS is just an additional layer on top of rocBLAS and you can compile rocBLAS with cuda. So it should work already, although I haven't tested it. |
Would we still have to add hipBLAS as a dependency of COSMA in that case? |
No, if rocBLAS works correctly with cuda. There would just be few changes to the CMake config required. |
As written here it seems the cmake for the CUDA backend of rocBLAS is broken at the moment:
|
I also spent an hour yesterday trying to build rocBLAS for Cuda - without success so far. And even if we get it to work eventually, it will probably be rather brittle because it pulls in many additional dependencies. In comparison hipBLAS is very lightweight. |
That's unfortunate, it would have made things easier. I think, adding hipBLAS alongside rocBLAS does not make much sense, since adding a code path, that's only used for testing somewhat defeats it's purpose. So I'd suggest two options:
|
In both hipBLAS and rocBLAS, there is a heavy overhead in the first DGEMM call. If rocBLAS is used, one could call rocblas_initialize() before any DGEMM call to avoid this overhead in the timing region. We can try to do this in COSMA and shift to a hipBLAS implementation for improving portability. I will work with Marko later this week to get this done. |
I already have a version of Tiled-MM with hipblas instead of rocblas: https://github.com/AdhocMan/Tiled-MM/tree/hipblas |
It would be nice to have support for hipBLAS as this allows to test the hip code path on a Nvidia device.
In return the support for rocBLAS could be dropped.
The text was updated successfully, but these errors were encountered: