Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Avoid alignment on kernel pointer parameters #11979

Open
wants to merge 1 commit into
base: sycl
Choose a base branch
from

Conversation

premanandrao
Copy link
Contributor

When creating the kernel, we assume alignment of kernel pointer parameters based on the alignment of their pointee types. This can lead to incorrect results.

When creating the kernel, we assume alignment of kernel
pointer parameters based on the alignment of their pointee
types.  This can lead to incorrect results.
@premanandrao premanandrao requested a review from a team as a code owner November 22, 2023 16:29
@@ -0,0 +1,52 @@
// RUN: %clang_cc1 -fsycl-is-device -O0 -internal-isystem %S/Inputs -triple spir64 -emit-llvm -o - %s | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// RUN: %clang_cc1 -fsycl-is-device -O0 -internal-isystem %S/Inputs -triple spir64 -emit-llvm -o - %s | FileCheck %s
// RUN: %clang_cc1 -fsycl-is-device -Xclang -disable-llvm-passes -internal-isystem %S/Inputs -triple spir64 -emit-llvm -o - %s | FileCheck %s

Comment on lines +2853 to +2856
//
// Don't do this for SYCL, as this assumption does not hold.
if (!getLangOpts().SYCLIsDevice && TargetDecl &&
TargetDecl->hasAttr<OpenCLKernelAttr>() && ParamType->isPointerType()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@premanandrao, do I get it right that alignment is not guaranteed for USM allocations only?
Is it true for L0 only or OpenCL is impacted as well?

I'm surprised to see the deviation from OpenCL properties. It might hard to justify in upstream. If SYCL compiler doesn't genuine OpenCL kernel, can we continue using OpenCLKernelAttr or better to have a SYCL specific attribute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not guaranteed for USM allocations, but since we don't do pointer analysis, we can't deduce the alignment in general.

Good questions about L0 vs OpenCL. I agree with your suggestion that perhaps we use a different SYCL attribute if it applies to OpenCL.

I have added @GarveyJoe and @ajaykumarkannan to the PR; they had identified and requested this change. I would like to have their thoughts on this too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_unified_shared_memory.html. USM allocation alignment requirements match OpenCL buffer (i.e. it must be a power of two and must be equal to or smaller than the size of the largest data type supported by any OpenCL device in context), so we can re-use OpenCL kernel logic as-is for pointers to USM allocations. We can argue about whether OpenCL logic is correct, but I don't think it should cause the difference between OpenCL and SYCL.

I don't see similar alignment requirements for Level Zero though. Level Zero spec only requires alignment value to be a power of two. @bashbaug, do you know if Level Zero memory allocation functions have additional alignment guarantees like OpenCL?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bader, the wording you're looking at regarding alignment is about the alignment of the pointer returned by the allocation functions such as clSharedMemAllocINTEL. There is no requirement that the kernel argument passed in via clSetKernelArgMemPointerINTEL has that same alignment. The only restriction on the pointer passed to clSetKernelArgMemPointerINTEL is that it is somewhere within an allocation returned by one of the allocation functions:

Otherwise, the pointer value must be NULL or must point into a Unified Shared Memory allocation returned by clHostMemAllocINTEL, clDeviceMemAllocINTEL, or clSharedMemAllocINTEL.

As a result, the following code is legal OpenCL:

// Kernel
kernel void foo(global int *a) {
  global char *b = (char *)(a);
  *b =  ...;
}

// Host
char *p = (char *) clHostMemAllocINTEL(context, NULL, 2, 0, NULL);
p = &(p[1]);
clSetKernelArgMemPointerINTEL(kernel, 0, p);

And certainly in this case the kernel argument will not have alignment any higher than that of a char.

@premanandrao, since this same problem can be exposed in OpenCL, as my example demonstrates, I don't think your code should be SYCL-specific.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, there is a line in the OpenCL C spec saying:

For arguments to a __kernel function declared to be a pointer to a data type, the OpenCL compiler can assume that the pointee is always appropriately aligned as required by the data type.

A similar line also exists in the OpenCL SPIR-V environment spec:

For OpTypePointer arguments to a function, the compiler may assume that the pointer is appropriately aligned as required by the Type that the pointer points to.

The example above still might be OK, but because the int* kernel argument is not aligned to sizeof(int) == 4 bytes things could easily go wrong.

Does SYCL (or C++, generally) make similar guarantees?

do you know if Level Zero memory allocation functions have additional alignment guarantees like OpenCL?

The Level Zero memory allocation functions have a similar alignment parameter as the OpenCL allocation functions. The Level Zero spec doesn't seem to explicitly say what the behavior is when passing zero as the alignment, but I'm 99% sure it behaves the same as OpenCL, by choosing an implementation-defined alignment that is big enough for all basic data types.

Copy link
Contributor

@GarveyJoe GarveyJoe Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After much digging through the spec, I've concluded that this optimization is actually legal in all C++ programs (and thus in SYCL as well). Without ever saying so explicitly, the language standard goes to great lengths to ensure that any pointer that doesn't have at least the alignment of the type it points to has either an undefined value or produces undefined behaviour even if the pointer is never dereferenced. However, it seems that this is not the approach that clang has taken. This LLVM mailing list discussion started by John McCall in 2016 seems to summarize clang's current position: https://groups.google.com/g/llvm-dev/c/eJRto1ipCYQ. In that thread he proposes that clang maintain a more relaxed position than the C++ standard: that it is only UB to dereference an unaligned pointer. I can't find anything formal in the clang docs to indicate that his proposal was accepted but even in present day clang does not emit the alignment attributes that the stricter definition from the C++ standard would allow. It instead only emits alignment at access sites. At the very least it seems the clang community has tacitly accepted John's proposal.

Now we have to decide if we want to take advantage of the stronger guarantees of the standard or follow clang's looser direction. I suspect we might get push back from the community if we try to upstream code that takes advantage of this guarantee.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SYCL 2020 spec, in section 5.5 (Built-in scalar data types), requires scalar fundamental data types to have the same size and alignment for the host and device. The alignment annotations look correct to me as is.

I believe the stronger guarantee exists to allow for an implementation to diagnose the creation of an invalid pointer as opposed to having to wait until the pointer is dereferenced.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following further offline discussion, I now agree with Joe that this is a good change. Without it, code might behave differently in a kernel than in another device function and that just seems weird and unnecessary. I don't think the optimization opportunity is significant.

One of the things that was helpful for me in reaching this conclusion is that alias annotations in LLVM IR are coalesced; when there are multiple relevant annotations (e.g., on a parameter with a pointer type and on a load/store that uses that pointer), code gen can use the more strict one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the optimization opportunity is significant.

One of the things that was helpful for me in reaching this conclusion is that alias annotations in LLVM IR are coalesced; when there are multiple relevant annotations (e.g., on a parameter with a pointer type and on a load/store that uses that pointer), code gen can use the more strict one.

I strongly recommend testing this claim using available means.

@jingwan2, FYI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing is always a good idea :)

The reason I think the optimization opportunity is not significant is because the alias information is (currently) lost as soon as one of these pointers is passed to another function (though subject to inlining considerations I'm sure). At any rate, it would be good to get input from someone with actual optimization experience.

Copy link
Contributor

@GarveyJoe GarveyJoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented

Comment on lines +2854 to +2856
// Don't do this for SYCL, as this assumption does not hold.
if (!getLangOpts().SYCLIsDevice && TargetDecl &&
TargetDecl->hasAttr<OpenCLKernelAttr>() && ParamType->isPointerType()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that the OpenCLKernelAttr attribute is only used for OpenCL and SYCL, perhaps it makes sense to restrict the conditional to matching OpenCL specifically rather than any language extension other than SYCL.

Suggested change
// Don't do this for SYCL, as this assumption does not hold.
if (!getLangOpts().SYCLIsDevice && TargetDecl &&
TargetDecl->hasAttr<OpenCLKernelAttr>() && ParamType->isPointerType()) {
if (getLangOpts().OpenCL && TargetDecl &&
TargetDecl->hasAttr<OpenCLKernelAttr>() && ParamType->isPointerType()) {

Copy link
Contributor

This pull request is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be automatically closed in 30 days.

@github-actions github-actions bot added the Stale label Sep 13, 2024
@tiwaria1 tiwaria1 removed their request for review September 13, 2024 14:58
@github-actions github-actions bot removed the Stale label Sep 14, 2024
Copy link
Contributor

This pull request is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be automatically closed in 30 days.

@github-actions github-actions bot added the Stale label Mar 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants