-
Notifications
You must be signed in to change notification settings - Fork 798
Handler-less kernel submit API #19294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
Conversation
sycl/include/sycl/queue.hpp
Outdated
@@ -3680,6 +3743,21 @@ class __SYCL_EXPORT queue : public detail::OwnerLessBase<queue> { | |||
const detail::code_location &CodeLoc, | |||
bool IsTopCodeLoc) const; | |||
|
|||
event submit_with_event_impl( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about eventless? It is not done yet, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it would be similar, so I've skipped it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the disadvantage of returning optional<event>
and having somewhere (probably, in SubmissionInfo
, as this is mode of submission) a flag, pointing out is it event or eventless mode? I think about bunch of functions that pass arguments by chain and about duplicating them (for event and for eventless) and this is not looks good. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if returning std::optional<event>
is a good idea because of ABI concerns. It might not have a stable ABI across compiler versions or even different standard libraries (libstdc++ vs libc++).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if returning
std::optional<event>
is a good idea because of ABI concerns. It might not have a stable ABI across compiler versions or even different standard libraries (libstdc++ vs libc++).
Yes, good point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, sycl::detail::optional
might be considered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sycl::detail::optional might work, good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But still we need to care about the stable layout of the sycl::detail::optional
. I am not sure that we are doing it today.
I think having two versions (that return sycl::event
and return void
) might be a good alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't generally do that and we rely on backward compatibility guarantees of the C++ library we use (GNU libstdc++ on Linux/MSVC on Windows). The only exception is pre-C++11 ABI of GNU libstdc++ that pyTorch used to use (see https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html). I don't see std::optional
listed on that page, so we should be safe to use it.
Another possible caveat is if some STL's implementation of it isn't is_sycl_device_copyable
. I think that might have been a reason why we added sycl::detail::optional
(or maybe it was simply added when we used C++14, `std::optional' is C++17 and above).
Anyway, unless you have a known case when it doesn't work, the current approach in the rest of the project is to use std::optional
, AFAIK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR, I would like to see at least one public interface implementation that utilizes this approach, just to ensure it works.
In the latest update, there are two public interfaces: The enqueue functions extension, and queue.parallel_for. Both are enabled only if __DPCPP_ENABLE_UNFINISHED_NO_CGH_SUBMIT is defined. |
expose the new APIs as public under a new define
sycl/include/sycl/ext/oneapi/experimental/enqueue_functions.hpp
Outdated
Show resolved
Hide resolved
template <typename KernelName = sycl::detail::auto_name, typename PropertiesT, | ||
typename KernelType, int Dims> | ||
void submit(const queue &Q, PropertiesT Props, nd_range<Dims> Range, | ||
const KernelType &KernelFunc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not rvalue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the convention in this file is to pass the KernelFunc as lvalue reference. Maybe it would make sense to change it everywhere in a separate PR, for consistency.
sycl/include/sycl/ext/oneapi/experimental/enqueue_functions.hpp
Outdated
Show resolved
Hide resolved
sycl/include/sycl/ext/oneapi/experimental/enqueue_functions.hpp
Outdated
Show resolved
Hide resolved
int &KernelNumArgs() { return MKernelNumArgs; } | ||
const int &KernelNumArgs() const { return MKernelNumArgs; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer explicit getter/setter methods instead of this ugly approach that forces us to return int
by reference. Also caller code that uses this class will look better and less error-prone with explicit getter/setter methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, if we need a setter, then design is wrong. Why would a kernel change its number of arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, the ProcessKernelRuntimeInfo
function should be redesigned to return the KernelRuntimeInfo
object instead of accepting one by reference and initializing via setter methods. We can get rid of setters and initialize it using ctor.
But the main question is asked by @aelovikov-intel above if we need KernelRuntimeInfo
at all.
} | ||
} | ||
|
||
Args = extractArgsAndReqsFromLambda(KRInfo.GetKernelFuncPtr(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of the path with a handler, are we extracting args on every submission? Can we cache it somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a given kernel type, the value of arguments can change between the invocations, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the proper approach should be to separate handler
-based submit implementation to properly separate lifetime extension (via copy/move) and the actual enqueue that would be handler
-less. Every new handler
-less API should be immediately used by handler
-based submission path by delegating to it.
Just writing a bunch of new code on the side without integrating it into existing submission path is a very bad choice.
int &KernelNumArgs() { return MKernelNumArgs; } | ||
const int &KernelNumArgs() const { return MKernelNumArgs; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, if we need a setter, then design is wrong. Why would a kernel change its number of arguments?
// This class is intended to store the kernel runtime information, | ||
// extracted from the compile time kernel structures. | ||
class __SYCL_EXPORT KernelRuntimeInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this unified with @sergey-semenov 's kernel-name-based cache? They both serve the same purpose of type-erasing kernel information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently this info is stored in the handler, and this is a new structure which wraps it for no-handler cases. Are you suggesting, that this should be moved from the handler to the kernel name based cache, and then used in both flows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that this should be moved from the handler to the kernel name based cache, and then used in both flows
That sounds very reasonable.
sycl/include/sycl/ext/oneapi/experimental/enqueue_functions.hpp
Outdated
Show resolved
Hide resolved
#ifdef __DPCPP_ENABLE_UNFINISHED_NO_CGH_SUBMIT | ||
template <typename KernelName, typename PropertiesT, typename KernelType, | ||
int Dims> | ||
void submit_direct_impl(const queue &Q, PropertiesT Props, nd_range<Dims> Range, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having lots of layers of tiny template helpers is bad for compile time, why can't it be inlined?
Ideally, most of interfaces accepting the kernel type as a template param must process compile-time properties immediately and only call interfaces that accept type-erased kernel.
Additionally, less tiny layers makes the code much easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code follows the convention which is there for handler based submissions. We should probably refactor the entire file in a separate PR (after this PR is merged).
@@ -312,6 +312,57 @@ event queue::submit_with_event_impl( | |||
return impl->submit_with_event(CGH, SubmitInfo, CodeLoc, IsTopCodeLoc); | |||
} | |||
|
|||
#ifdef __INTEL_PREVIEW_BREAKING_CHANGES | |||
event queue::submit_direct_with_event_impl( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having overloads vs templates is important for public APIs (because of the implicit conversion), but for our implementation details we can just use template to reduce amount of boilerplate code: https://godbolt.org/z/rPW4jx8h7
Headers will only have template declaration, .cpp file will export necessary instantiations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can do this. Let's wait until the ABI is more stable, since we might be able to simplify and avoid the dimensions template here.
bool &KernelHasSpecialCaptures() { return MKernelHasSpecialCaptures; } | ||
const bool &KernelHasSpecialCaptures() const { | ||
return MKernelHasSpecialCaptures; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How important is it to support that scenario (having special captures)? I'd hope that we will be changing the decomposition approach relatively soon and that code path will look very different in how we pass/set kernel arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR it is only used to exclude an unsupported case, so I would leave it for now, until the approach is changed.
#ifdef __DPCPP_ENABLE_UNFINISHED_NO_CGH_SUBMIT | ||
template <typename KernelName = sycl::detail::auto_name, typename PropertiesT, | ||
typename KernelType, int Dims> | ||
event submit_with_event(const queue &Q, PropertiesT Props, nd_range<Dims> Range, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find where this function is used. Could you please clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of the API to be called by the app, when an event is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that's for unimplemented yet part of code. Thanks!
typename TransformUserItemType<Dims, LambdaArgType>::type>; | ||
|
||
KRInfo.HostKernel().reset( | ||
new detail::HostKernel<KernelType, TransformedArgType, Dims>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to allocate it on heap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the scheduler takes a shared_ptr type argument for the HostKernel, and stores it until the kernel is actually submitted. Do you think we should rather pass the object by value to the scheduler?
No description provided.