[SYCL] Optimize checkValueRange#18296
Merged
uditagarwal97 merged 4 commits intointel:syclfrom May 5, 2025
Merged
Conversation
checkValueRange is used to determine if an nd_range is compatible with -fsycl-queries-fit-in-int, and is run as part of every kernel launch. The previous implementation checked the size of each component of the global range, local range, offset, and global range + offset, and also checked the linearized version of each of these values. The new implementation simplifies these checks, based on the following logic: - The linear global range size must be >= every component of the global range. If the linear global range fits in int, we don't need to check anything else. - Each value in the global range must be >= the value in the local range. If the global range fits in int, we don't need to check the local range. - There is no need to check offset-related values if the offset is zero. The new implementation also makes use of __builtin_mul_overflow where available. This shifts the burden of maintaining fast code for these checks to the compiler, and allows us to benefit from aggressive optimizations. The new implementation could be optimized further if there was a quick way to check whether an nd_range has an offset. Signed-off-by: John Pennycook <john.pennycook@intel.com>
The new implementation generates the same error message regardless of what
caused the overflow, so the test had to be updated with the new message.
I removed one of the tests because it is invalid: SYCL forbids launching
a global size of {1, 1} with a local size of anything other than {1, 1}.
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Contributor
Author
|
A note to reviewers: I had to remove a test, but the test was invalid. In SYCL, the global range must be divisible by the local range, and it makes no sense to enqueue a global range of {1, 1} with a local range that is outside the range of an int. With my proposed changes, the test as written still throws an exception, it's just a different exception. Depending on the device, it will fail either because: the local range is too large for the device (e.g., PVC is limited to {1024, 1024, 1024}); or because the global range isn't divisible by the local range. |
Contributor
Author
|
The failing test looks unrelated to me. |
uditagarwal97
approved these changes
May 2, 2025
Co-authored-by: Udit Kumar Agarwal <udit.agarwal@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
uditagarwal97
approved these changes
May 2, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
checkValueRange is used to determine if an nd_range is compatible with -fsycl-queries-fit-in-int, and is run as part of every kernel launch.
The previous implementation checked the size of each component of the global range, local range, offset, and global range + offset, and also checked the linearized version of each of these values.
The new implementation simplifies these checks, based on the following logic:
The linear global range size must be >= every component of the global range.
If the linear global range fits in int, we don't need to check anything else.
Each value in the global range must be >= the value in the local range.
If the global range fits in int, we don't need to check the local range.
There is no need to check offset-related values if the offset is zero.
The new implementation also makes use of __builtin_mul_overflow where available. This shifts the burden of maintaining fast code for these checks to the compiler, and allows us to benefit from aggressive optimizations.
The new implementation could be optimized further if there was a quick way to check whether an nd_range has an offset.