-
Notifications
You must be signed in to change notification settings - Fork 768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Avoid alignment on kernel pointer parameters #11979
base: sycl
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -2850,8 +2850,10 @@ void CodeGenModule::ConstructAttributeList(StringRef Name, | |||||||||||
// > For arguments to a __kernel function declared to be a pointer to a | ||||||||||||
// > data type, the OpenCL compiler can assume that the pointee is always | ||||||||||||
// > appropriately aligned as required by the data type. | ||||||||||||
if (TargetDecl && TargetDecl->hasAttr<OpenCLKernelAttr>() && | ||||||||||||
ParamType->isPointerType()) { | ||||||||||||
// | ||||||||||||
// Don't do this for SYCL, as this assumption does not hold. | ||||||||||||
if (!getLangOpts().SYCLIsDevice && TargetDecl && | ||||||||||||
TargetDecl->hasAttr<OpenCLKernelAttr>() && ParamType->isPointerType()) { | ||||||||||||
Comment on lines
+2854
to
+2856
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Assuming that the
Suggested change
|
||||||||||||
QualType PTy = ParamType->getPointeeType(); | ||||||||||||
if (!PTy->isIncompleteType() && PTy->isConstantSizeType()) { | ||||||||||||
llvm::Align Alignment = | ||||||||||||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,52 @@ | ||||||
// RUN: %clang_cc1 -fsycl-is-device -O0 -internal-isystem %S/Inputs -triple spir64 -emit-llvm -o - %s | FileCheck %s | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
// Test that the pointer parameters generated for the kernel do not | ||||||
// have alignment on them. | ||||||
|
||||||
#include "sycl.hpp" | ||||||
|
||||||
using namespace sycl; | ||||||
|
||||||
struct S; | ||||||
|
||||||
void Test() { | ||||||
struct MyIP { | ||||||
char* a; | ||||||
int* b; | ||||||
double* c; | ||||||
|
||||||
void operator()() const { | ||||||
*((int *) a) = 1; // 1 on arg, 4 on site | ||||||
*((double *) b) = 2; // 4 on arg, 8 on site | ||||||
*((char *) c) = 3; // 8 on arg, 1 on site | ||||||
} | ||||||
}; | ||||||
|
||||||
constexpr int kN = 8; | ||||||
auto host_array_A = | ||||||
malloc_shared<char>(kN); | ||||||
|
||||||
auto host_array_B = | ||||||
malloc_shared<int>(kN); | ||||||
|
||||||
auto host_array_C = | ||||||
malloc_shared<double>(kN); | ||||||
|
||||||
for (int i = 0; i < kN; i++) { | ||||||
host_array_A[i] = i; | ||||||
host_array_B[i] = i * 2; | ||||||
} | ||||||
|
||||||
sycl::kernel_single_task<S>(MyIP{host_array_A, host_array_B, host_array_C}); | ||||||
|
||||||
free(host_array_A); | ||||||
free(host_array_B); | ||||||
free(host_array_C); | ||||||
} | ||||||
|
||||||
int main() { | ||||||
Test(); | ||||||
return 0; | ||||||
} | ||||||
|
||||||
// CHECK: define {{.*}} spir_kernel void @_ZTS1S(ptr addrspace(1) noundef %_arg_a, ptr addrspace(1) noundef %_arg_b, ptr addrspace(1) noundef %_arg_c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@premanandrao, do I get it right that alignment is not guaranteed for USM allocations only?
Is it true for L0 only or OpenCL is impacted as well?
I'm surprised to see the deviation from OpenCL properties. It might hard to justify in upstream. If SYCL compiler doesn't genuine OpenCL kernel, can we continue using
OpenCLKernelAttr
or better to have a SYCL specific attribute?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not guaranteed for USM allocations, but since we don't do pointer analysis, we can't deduce the alignment in general.
Good questions about L0 vs OpenCL. I agree with your suggestion that perhaps we use a different SYCL attribute if it applies to OpenCL.
I have added @GarveyJoe and @ajaykumarkannan to the PR; they had identified and requested this change. I would like to have their thoughts on this too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_unified_shared_memory.html. USM allocation alignment requirements match OpenCL buffer (i.e. it must be a power of two and must be equal to or smaller than the size of the largest data type supported by any OpenCL device in context), so we can re-use OpenCL kernel logic as-is for pointers to USM allocations. We can argue about whether OpenCL logic is correct, but I don't think it should cause the difference between OpenCL and SYCL.
I don't see similar alignment requirements for Level Zero though. Level Zero spec only requires alignment value to be a power of two. @bashbaug, do you know if Level Zero memory allocation functions have additional alignment guarantees like OpenCL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bader, the wording you're looking at regarding alignment is about the alignment of the pointer returned by the allocation functions such as clSharedMemAllocINTEL. There is no requirement that the kernel argument passed in via clSetKernelArgMemPointerINTEL has that same alignment. The only restriction on the pointer passed to clSetKernelArgMemPointerINTEL is that it is somewhere within an allocation returned by one of the allocation functions:
As a result, the following code is legal OpenCL:
And certainly in this case the kernel argument will not have alignment any higher than that of a char.
@premanandrao, since this same problem can be exposed in OpenCL, as my example demonstrates, I don't think your code should be SYCL-specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, there is a line in the OpenCL C spec saying:
A similar line also exists in the OpenCL SPIR-V environment spec:
The example above still might be OK, but because the
int*
kernel argument is not aligned tosizeof(int)
== 4 bytes things could easily go wrong.Does SYCL (or C++, generally) make similar guarantees?
The Level Zero memory allocation functions have a similar alignment parameter as the OpenCL allocation functions. The Level Zero spec doesn't seem to explicitly say what the behavior is when passing zero as the alignment, but I'm 99% sure it behaves the same as OpenCL, by choosing an implementation-defined alignment that is big enough for all basic data types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After much digging through the spec, I've concluded that this optimization is actually legal in all C++ programs (and thus in SYCL as well). Without ever saying so explicitly, the language standard goes to great lengths to ensure that any pointer that doesn't have at least the alignment of the type it points to has either an undefined value or produces undefined behaviour even if the pointer is never dereferenced. However, it seems that this is not the approach that clang has taken. This LLVM mailing list discussion started by John McCall in 2016 seems to summarize clang's current position: https://groups.google.com/g/llvm-dev/c/eJRto1ipCYQ. In that thread he proposes that clang maintain a more relaxed position than the C++ standard: that it is only UB to dereference an unaligned pointer. I can't find anything formal in the clang docs to indicate that his proposal was accepted but even in present day clang does not emit the alignment attributes that the stricter definition from the C++ standard would allow. It instead only emits alignment at access sites. At the very least it seems the clang community has tacitly accepted John's proposal.
Now we have to decide if we want to take advantage of the stronger guarantees of the standard or follow clang's looser direction. I suspect we might get push back from the community if we try to upstream code that takes advantage of this guarantee.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SYCL 2020 spec, in section 5.5 (Built-in scalar data types), requires scalar fundamental data types to have the same size and alignment for the host and device. The alignment annotations look correct to me as is.
I believe the stronger guarantee exists to allow for an implementation to diagnose the creation of an invalid pointer as opposed to having to wait until the pointer is dereferenced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following further offline discussion, I now agree with Joe that this is a good change. Without it, code might behave differently in a kernel than in another device function and that just seems weird and unnecessary. I don't think the optimization opportunity is significant.
One of the things that was helpful for me in reaching this conclusion is that alias annotations in LLVM IR are coalesced; when there are multiple relevant annotations (e.g., on a parameter with a pointer type and on a load/store that uses that pointer), code gen can use the more strict one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly recommend testing this claim using available means.
@jingwan2, FYI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing is always a good idea :)
The reason I think the optimization opportunity is not significant is because the alias information is (currently) lost as soon as one of these pointers is passed to another function (though subject to inlining considerations I'm sure). At any rate, it would be good to get input from someone with actual optimization experience.