-
Notifications
You must be signed in to change notification settings - Fork 768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Avoid alignment on kernel pointer parameters #11979
base: sycl
Are you sure you want to change the base?
Conversation
When creating the kernel, we assume alignment of kernel pointer parameters based on the alignment of their pointee types. This can lead to incorrect results.
@@ -0,0 +1,52 @@ | |||
// RUN: %clang_cc1 -fsycl-is-device -O0 -internal-isystem %S/Inputs -triple spir64 -emit-llvm -o - %s | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// RUN: %clang_cc1 -fsycl-is-device -O0 -internal-isystem %S/Inputs -triple spir64 -emit-llvm -o - %s | FileCheck %s | |
// RUN: %clang_cc1 -fsycl-is-device -Xclang -disable-llvm-passes -internal-isystem %S/Inputs -triple spir64 -emit-llvm -o - %s | FileCheck %s |
// | ||
// Don't do this for SYCL, as this assumption does not hold. | ||
if (!getLangOpts().SYCLIsDevice && TargetDecl && | ||
TargetDecl->hasAttr<OpenCLKernelAttr>() && ParamType->isPointerType()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@premanandrao, do I get it right that alignment is not guaranteed for USM allocations only?
Is it true for L0 only or OpenCL is impacted as well?
I'm surprised to see the deviation from OpenCL properties. It might hard to justify in upstream. If SYCL compiler doesn't genuine OpenCL kernel, can we continue using OpenCLKernelAttr
or better to have a SYCL specific attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not guaranteed for USM allocations, but since we don't do pointer analysis, we can't deduce the alignment in general.
Good questions about L0 vs OpenCL. I agree with your suggestion that perhaps we use a different SYCL attribute if it applies to OpenCL.
I have added @GarveyJoe and @ajaykumarkannan to the PR; they had identified and requested this change. I would like to have their thoughts on this too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_unified_shared_memory.html. USM allocation alignment requirements match OpenCL buffer (i.e. it must be a power of two and must be equal to or smaller than the size of the largest data type supported by any OpenCL device in context), so we can re-use OpenCL kernel logic as-is for pointers to USM allocations. We can argue about whether OpenCL logic is correct, but I don't think it should cause the difference between OpenCL and SYCL.
I don't see similar alignment requirements for Level Zero though. Level Zero spec only requires alignment value to be a power of two. @bashbaug, do you know if Level Zero memory allocation functions have additional alignment guarantees like OpenCL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bader, the wording you're looking at regarding alignment is about the alignment of the pointer returned by the allocation functions such as clSharedMemAllocINTEL. There is no requirement that the kernel argument passed in via clSetKernelArgMemPointerINTEL has that same alignment. The only restriction on the pointer passed to clSetKernelArgMemPointerINTEL is that it is somewhere within an allocation returned by one of the allocation functions:
Otherwise, the pointer value must be NULL or must point into a Unified Shared Memory allocation returned by clHostMemAllocINTEL, clDeviceMemAllocINTEL, or clSharedMemAllocINTEL.
As a result, the following code is legal OpenCL:
// Kernel
kernel void foo(global int *a) {
global char *b = (char *)(a);
*b = ...;
}
// Host
char *p = (char *) clHostMemAllocINTEL(context, NULL, 2, 0, NULL);
p = &(p[1]);
clSetKernelArgMemPointerINTEL(kernel, 0, p);
And certainly in this case the kernel argument will not have alignment any higher than that of a char.
@premanandrao, since this same problem can be exposed in OpenCL, as my example demonstrates, I don't think your code should be SYCL-specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, there is a line in the OpenCL C spec saying:
For arguments to a __kernel function declared to be a pointer to a data type, the OpenCL compiler can assume that the pointee is always appropriately aligned as required by the data type.
A similar line also exists in the OpenCL SPIR-V environment spec:
For OpTypePointer arguments to a function, the compiler may assume that the pointer is appropriately aligned as required by the Type that the pointer points to.
The example above still might be OK, but because the int*
kernel argument is not aligned to sizeof(int)
== 4 bytes things could easily go wrong.
Does SYCL (or C++, generally) make similar guarantees?
do you know if Level Zero memory allocation functions have additional alignment guarantees like OpenCL?
The Level Zero memory allocation functions have a similar alignment parameter as the OpenCL allocation functions. The Level Zero spec doesn't seem to explicitly say what the behavior is when passing zero as the alignment, but I'm 99% sure it behaves the same as OpenCL, by choosing an implementation-defined alignment that is big enough for all basic data types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After much digging through the spec, I've concluded that this optimization is actually legal in all C++ programs (and thus in SYCL as well). Without ever saying so explicitly, the language standard goes to great lengths to ensure that any pointer that doesn't have at least the alignment of the type it points to has either an undefined value or produces undefined behaviour even if the pointer is never dereferenced. However, it seems that this is not the approach that clang has taken. This LLVM mailing list discussion started by John McCall in 2016 seems to summarize clang's current position: https://groups.google.com/g/llvm-dev/c/eJRto1ipCYQ. In that thread he proposes that clang maintain a more relaxed position than the C++ standard: that it is only UB to dereference an unaligned pointer. I can't find anything formal in the clang docs to indicate that his proposal was accepted but even in present day clang does not emit the alignment attributes that the stricter definition from the C++ standard would allow. It instead only emits alignment at access sites. At the very least it seems the clang community has tacitly accepted John's proposal.
Now we have to decide if we want to take advantage of the stronger guarantees of the standard or follow clang's looser direction. I suspect we might get push back from the community if we try to upstream code that takes advantage of this guarantee.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SYCL 2020 spec, in section 5.5 (Built-in scalar data types), requires scalar fundamental data types to have the same size and alignment for the host and device. The alignment annotations look correct to me as is.
I believe the stronger guarantee exists to allow for an implementation to diagnose the creation of an invalid pointer as opposed to having to wait until the pointer is dereferenced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following further offline discussion, I now agree with Joe that this is a good change. Without it, code might behave differently in a kernel than in another device function and that just seems weird and unnecessary. I don't think the optimization opportunity is significant.
One of the things that was helpful for me in reaching this conclusion is that alias annotations in LLVM IR are coalesced; when there are multiple relevant annotations (e.g., on a parameter with a pointer type and on a load/store that uses that pointer), code gen can use the more strict one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the optimization opportunity is significant.
One of the things that was helpful for me in reaching this conclusion is that alias annotations in LLVM IR are coalesced; when there are multiple relevant annotations (e.g., on a parameter with a pointer type and on a load/store that uses that pointer), code gen can use the more strict one.
I strongly recommend testing this claim using available means.
@jingwan2, FYI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing is always a good idea :)
The reason I think the optimization opportunity is not significant is because the alias information is (currently) lost as soon as one of these pointers is passed to another function (though subject to inlining considerations I'm sure). At any rate, it would be good to get input from someone with actual optimization experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented
// Don't do this for SYCL, as this assumption does not hold. | ||
if (!getLangOpts().SYCLIsDevice && TargetDecl && | ||
TargetDecl->hasAttr<OpenCLKernelAttr>() && ParamType->isPointerType()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that the OpenCLKernelAttr
attribute is only used for OpenCL and SYCL, perhaps it makes sense to restrict the conditional to matching OpenCL specifically rather than any language extension other than SYCL.
// Don't do this for SYCL, as this assumption does not hold. | |
if (!getLangOpts().SYCLIsDevice && TargetDecl && | |
TargetDecl->hasAttr<OpenCLKernelAttr>() && ParamType->isPointerType()) { | |
if (getLangOpts().OpenCL && TargetDecl && | |
TargetDecl->hasAttr<OpenCLKernelAttr>() && ParamType->isPointerType()) { |
This pull request is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be automatically closed in 30 days. |
This pull request is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be automatically closed in 30 days. |
When creating the kernel, we assume alignment of kernel pointer parameters based on the alignment of their pointee types. This can lead to incorrect results.