Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: Discrepancy between hipGetDeviceProperties and rocminfo #140

Open
Rombur opened this issue Feb 19, 2025 · 7 comments
Open

[Issue]: Discrepancy between hipGetDeviceProperties and rocminfo #140

Rombur opened this issue Feb 19, 2025 · 7 comments

Comments

@Rombur
Copy link

Rombur commented Feb 19, 2025

Problem Description

In AMD's documentation, it says that Maximum x-, y- or z-dimension of a grid is 2^{32}-1. These are also the values you get when using rocminfo. hipGetDeviceProperties used to return the same values but it was changed in 31b362b. Which values should I use?

Operating System

NA

CPU

NA

GPU

MI 250

ROCm Version

ROCm 6.3

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

@Rombur
Copy link
Author

Rombur commented Feb 20, 2025

From what I can tell the change was made to accommodate hipModuleLaunchKernel. The GridDim size can be larger if you use hipLaunchKernel. It seems wrong to return a smaller value of GridDim size just because of the requirement of one function. Especially when hipModuleLaunchKernel is only used for some Fortran compatibility stuffs.

@Rombur Rombur changed the title [Issue]: Discrepancy between hipGetDeviceProperties and rocm-smi [Issue]: Discrepancy between hipGetDeviceProperties and rocminfo Feb 21, 2025
@dalg24
Copy link

dalg24 commented Feb 21, 2025

Could we get please some guidance from AMD? This is impacting Kokkos.
cc @crtrott @lucbv

@Rombur
Copy link
Author

Rombur commented Feb 21, 2025

I just realized that the values actually never matched. The documentation says that the maximum number is 2^{32}-1 but hipGetDeviceProperties returns std::numeric_limits<int32_t>::max() for GridDim.x which is 2^{31}-1. It should probably be uint32t_t instead of int32_t.

@darren-amd
Copy link

Hi @Rombur,

I had a chat with the internal team and the recommendation is to use the values from hipGetDeviceProperties, namely: 2^{31} - 1, 2^{16} - 1, and 2^{16} - 1 for the x, y, and z dimensions. This is in line with CUDA: Technical Specifications per Compute Capability. I am working on getting the documentation updated to reflect this change. Thanks!

@Rombur
Copy link
Author

Rombur commented Feb 26, 2025

@darren-amd Thanks for the update. Will rocminfo also be update, so that we don't get conflicting information?

@darren-amd
Copy link

darren-amd commented Mar 3, 2025

Hi @Rombur,

Yes, I have an internal PR to get rocminfo updated as well.

@Rombur
Copy link
Author

Rombur commented Mar 3, 2025

Great. Thank you @darren-amd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants