-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL] Optimize checkValueRange #18296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
Conversation
checkValueRange is used to determine if an nd_range is compatible with -fsycl-queries-fit-in-int, and is run as part of every kernel launch. The previous implementation checked the size of each component of the global range, local range, offset, and global range + offset, and also checked the linearized version of each of these values. The new implementation simplifies these checks, based on the following logic: - The linear global range size must be >= every component of the global range. If the linear global range fits in int, we don't need to check anything else. - Each value in the global range must be >= the value in the local range. If the global range fits in int, we don't need to check the local range. - There is no need to check offset-related values if the offset is zero. The new implementation also makes use of __builtin_mul_overflow where available. This shifts the burden of maintaining fast code for these checks to the compiler, and allows us to benefit from aggressive optimizations. The new implementation could be optimized further if there was a quick way to check whether an nd_range has an offset. Signed-off-by: John Pennycook <[email protected]>
The new implementation generates the same error message regardless of what caused the overflow, so the test had to be updated with the new message. I removed one of the tests because it is invalid: SYCL forbids launching a global size of {1, 1} with a local size of anything other than {1, 1}. Signed-off-by: John Pennycook <[email protected]>
A note to reviewers: I had to remove a test, but the test was invalid. In SYCL, the global range must be divisible by the local range, and it makes no sense to enqueue a global range of {1, 1} with a local range that is outside the range of an int. With my proposed changes, the test as written still throws an exception, it's just a different exception. Depending on the device, it will fail either because: the local range is too large for the device (e.g., PVC is limited to {1024, 1024, 1024}); or because the global range isn't divisible by the local range. |
The failing test looks unrelated to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Minor NITs
Co-authored-by: Udit Kumar Agarwal <[email protected]>
Signed-off-by: John Pennycook <[email protected]>
checkValueRange is used to determine if an nd_range is compatible with -fsycl-queries-fit-in-int, and is run as part of every kernel launch.
The previous implementation checked the size of each component of the global range, local range, offset, and global range + offset, and also checked the linearized version of each of these values.
The new implementation simplifies these checks, based on the following logic:
The linear global range size must be >= every component of the global range.
If the linear global range fits in int, we don't need to check anything else.
Each value in the global range must be >= the value in the local range.
If the global range fits in int, we don't need to check the local range.
There is no need to check offset-related values if the offset is zero.
The new implementation also makes use of __builtin_mul_overflow where available. This shifts the burden of maintaining fast code for these checks to the compiler, and allows us to benefit from aggressive optimizations.
The new implementation could be optimized further if there was a quick way to check whether an nd_range has an offset.