-
Notifications
You must be signed in to change notification settings - Fork 772
[SYCL][L0][CUDA][HIP] Fix PI_KERNEL_GROUP_INFO_GLOBAL_WORK_SIZE queries #8769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I have a question. Is the max global work size independent of the global work size set in a host program for a kernel ? |
/verify with intel/llvm-test-suite#1694 |
@abagusetty, FYI. "verify with" command do not validate on CUDA/HIP platforms. |
Thanks, I stumbled upon that too and looked at the wording in Spec, which made me think it could be the max global limits.
|
The global work sizes from the query will be the same for any kernels. Right ? |
Yes, since the descriptor is a kernel_device_specific one: Any kernel from (custom device type or a built-in kernel) possibly returns the info of device specific global-work-sizes which in turn should be the same for all the kernels IMO. |
…m device-types appropriately
sycl/plugins/cuda/pi_cuda.hpp
Outdated
@@ -42,6 +42,11 @@ | |||
#include <unordered_map> | |||
#include <vector> | |||
|
|||
// Helper for one-liner validation | |||
#define PI_ASSERT(condition, error) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit misleading, as it does not assert on the condition, maybe consider renaming it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PI_ASSERT
to PI_ERR_CHECK
Gentle ping @smaslov-intel @jchlanda |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on L0 changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. I think these changes look good. I am a little curious what built-in kernels they would apply to, but I assume CUDA, HIP and L0 guarantee full possible work-sizes either way.
Thanks for the feed back on the built-ins, I too stumbled upon that a bit: Just convinced myself that they see the complete device limits. |
intel#8769 Signed-off-by: Jaime Arteaga <[email protected]>
intel#8769 Signed-off-by: Jaime Arteaga <[email protected]>
intel#8769 Signed-off-by: Jaime Arteaga <[email protected]>
intel#8769 Signed-off-by: Jaime Arteaga <[email protected]>
intel#8769 Signed-off-by: Jaime Arteaga <[email protected]>
intel#8769 Signed-off-by: Jaime Arteaga <[email protected]>
Address kernel query
global_work_size
for L0, CUDA, HIP fromPI_KERNEL_GROUP_INFO_GLOBAL_WORK_SIZE
Fixes #8766
For instance (for X-dimension)
L0:
maxGroupSizeX * maxGroupCountX
CUDA:
CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_X * CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_X