Handler-less kernel submit API #19294

slawekptak · 2025-07-03T15:23:20Z

No description provided.

sycl/include/sycl/queue.hpp

vinser52 · 2025-07-04T13:30:28Z

sycl/include/sycl/queue.hpp

@@ -3680,6 +3743,21 @@ class __SYCL_EXPORT queue : public detail::OwnerLessBase<queue> {
                               const detail::code_location &CodeLoc,
                               bool IsTopCodeLoc) const;

+  event submit_with_event_impl(


What about eventless? It is not done yet, right?

Yes, I think it would be similar, so I've skipped it for now.

What is the disadvantage of returning optional<event> and having somewhere (probably, in SubmissionInfo, as this is mode of submission) a flag, pointing out is it event or eventless mode? I think about bunch of functions that pass arguments by chain and about duplicating them (for event and for eventless) and this is not looks good. What do you think?

I am not sure if returning std::optional<event> is a good idea because of ABI concerns. It might not have a stable ABI across compiler versions or even different standard libraries (libstdc++ vs libc++).

I am not sure if returning std::optional<event> is a good idea because of ABI concerns. It might not have a stable ABI across compiler versions or even different standard libraries (libstdc++ vs libc++).

Yes, good point.

Probably, sycl::detail::optional might be considered.

sycl::detail::optional might work, good idea

But still we need to care about the stable layout of the sycl::detail::optional. I am not sure that we are doing it today.

I think having two versions (that return sycl::event and return void) might be a good alternative.

We don't generally do that and we rely on backward compatibility guarantees of the C++ library we use (GNU libstdc++ on Linux/MSVC on Windows). The only exception is pre-C++11 ABI of GNU libstdc++ that pyTorch used to use (see https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html). I don't see std::optional listed on that page, so we should be safe to use it.

Another possible caveat is if some STL's implementation of it isn't is_sycl_device_copyable. I think that might have been a reason why we added sycl::detail::optional (or maybe it was simply added when we used C++14, `std::optional' is C++17 and above).

Anyway, unless you have a known case when it doesn't work, the current approach in the rest of the project is to use std::optional, AFAIK.

vinser52

In this PR, I would like to see at least one public interface implementation that utilizes this approach, just to ensure it works.

sycl/include/sycl/queue.hpp

slawekptak · 2025-07-07T11:54:38Z

In this PR, I would like to see at least one public interface implementation that utilizes this approach, just to ensure it works.

In the latest update, there are two public interfaces: The enqueue functions extension, and queue.parallel_for. Both are enabled only if __DPCPP_ENABLE_UNFINISHED_NO_CGH_SUBMIT is defined.

expose the new APIs as public under a new define

sycl/include/sycl/queue.hpp

sycl/cmake/modules/AddSYCLUnitTest.cmake

sycl/include/sycl/ext/oneapi/experimental/enqueue_functions.hpp

vinser52 · 2025-08-13T12:28:33Z

sycl/include/sycl/ext/oneapi/experimental/enqueue_functions.hpp

+template <typename KernelName = sycl::detail::auto_name, typename PropertiesT,
+          typename KernelType, int Dims>
+void submit(const queue &Q, PropertiesT Props, nd_range<Dims> Range,
+            const KernelType &KernelFunc,


Why not rvalue?

Seems like the convention in this file is to pass the KernelFunc as lvalue reference. Maybe it would make sense to change it everywhere in a separate PR, for consistency.

sycl/include/sycl/ext/oneapi/experimental/enqueue_functions.hpp

vinser52 · 2025-08-13T13:15:48Z

sycl/include/sycl/queue.hpp

+  int &KernelNumArgs() { return MKernelNumArgs; }
+  const int &KernelNumArgs() const { return MKernelNumArgs; }


I prefer explicit getter/setter methods instead of this ugly approach that forces us to return int by reference. Also caller code that uses this class will look better and less error-prone with explicit getter/setter methods.

IMO, if we need a setter, then design is wrong. Why would a kernel change its number of arguments?

I agree, the ProcessKernelRuntimeInfo function should be redesigned to return the KernelRuntimeInfo object instead of accepting one by reference and initializing via setter methods. We can get rid of setters and initialize it using ctor.

But the main question is asked by @aelovikov-intel above if we need KernelRuntimeInfo at all.

sycl/include/sycl/queue.hpp

sycl/source/detail/queue_impl.cpp

vinser52 · 2025-08-13T14:19:38Z

sycl/source/detail/queue_impl.cpp

+    }
+  }
+
+  Args = extractArgsAndReqsFromLambda(KRInfo.GetKernelFuncPtr(),


In the case of the path with a handler, are we extracting args on every submission? Can we cache it somehow?

For a given kernel type, the value of arguments can change between the invocations, right?

aelovikov-intel

I think the proper approach should be to separate handler-based submit implementation to properly separate lifetime extension (via copy/move) and the actual enqueue that would be handler-less. Every new handler-less API should be immediately used by handler-based submission path by delegating to it.

Just writing a bunch of new code on the side without integrating it into existing submission path is a very bad choice.

aelovikov-intel · 2025-08-13T15:41:17Z

sycl/include/sycl/queue.hpp

+  int &KernelNumArgs() { return MKernelNumArgs; }
+  const int &KernelNumArgs() const { return MKernelNumArgs; }


IMO, if we need a setter, then design is wrong. Why would a kernel change its number of arguments?

aelovikov-intel · 2025-08-13T15:42:05Z

sycl/include/sycl/queue.hpp

+// This class is intended to store the kernel runtime information,
+// extracted from the compile time kernel structures.
+class __SYCL_EXPORT KernelRuntimeInfo {


Why isn't this unified with @sergey-semenov 's kernel-name-based cache? They both serve the same purpose of type-erasing kernel information.

Currently this info is stored in the handler, and this is a new structure which wraps it for no-handler cases. Are you suggesting, that this should be moved from the handler to the kernel name based cache, and then used in both flows?

that this should be moved from the handler to the kernel name based cache, and then used in both flows

That sounds very reasonable.

sycl/include/sycl/ext/oneapi/experimental/enqueue_functions.hpp

aelovikov-intel · 2025-08-13T19:06:52Z

sycl/include/sycl/ext/oneapi/experimental/enqueue_functions.hpp

+#ifdef __DPCPP_ENABLE_UNFINISHED_NO_CGH_SUBMIT
+template <typename KernelName, typename PropertiesT, typename KernelType,
+          int Dims>
+void submit_direct_impl(const queue &Q, PropertiesT Props, nd_range<Dims> Range,


Having lots of layers of tiny template helpers is bad for compile time, why can't it be inlined?

Ideally, most of interfaces accepting the kernel type as a template param must process compile-time properties immediately and only call interfaces that accept type-erased kernel.

Additionally, less tiny layers makes the code much easier to read.

This code follows the convention which is there for handler based submissions. We should probably refactor the entire file in a separate PR (after this PR is merged).

sycl/include/sycl/queue.hpp

aelovikov-intel · 2025-08-13T19:43:01Z

sycl/source/queue.cpp

@@ -312,6 +312,57 @@ event queue::submit_with_event_impl(
  return impl->submit_with_event(CGH, SubmitInfo, CodeLoc, IsTopCodeLoc);
 }

+#ifdef __INTEL_PREVIEW_BREAKING_CHANGES
+event queue::submit_direct_with_event_impl(


Having overloads vs templates is important for public APIs (because of the implicit conversion), but for our implementation details we can just use template to reduce amount of boilerplate code: https://godbolt.org/z/rPW4jx8h7

Headers will only have template declaration, .cpp file will export necessary instantiations.

Sure, we can do this. Let's wait until the ABI is more stable, since we might be able to simplify and avoid the dimensions template here.

aelovikov-intel · 2025-08-19T15:27:28Z

sycl/include/sycl/queue.hpp

+  bool &KernelHasSpecialCaptures() { return MKernelHasSpecialCaptures; }
+  const bool &KernelHasSpecialCaptures() const {
+    return MKernelHasSpecialCaptures;
+  }


How important is it to support that scenario (having special captures)? I'd hope that we will be changing the decomposition approach relatively soon and that code path will look very different in how we pass/set kernel arguments.

In this PR it is only used to exclude an unsupported case, so I would leave it for now, until the approach is changed.

sycl/include/sycl/khr/free_function_commands.hpp

Alexandr-Konovalov · 2025-08-20T14:33:33Z

sycl/include/sycl/ext/oneapi/experimental/enqueue_functions.hpp

+#ifdef __DPCPP_ENABLE_UNFINISHED_NO_CGH_SUBMIT
+template <typename KernelName = sycl::detail::auto_name, typename PropertiesT,
+          typename KernelType, int Dims>
+event submit_with_event(const queue &Q, PropertiesT Props, nd_range<Dims> Range,


I can't find where this function is used. Could you please clarify?

This is part of the API to be called by the app, when an event is needed.

So that's for unimplemented yet part of code. Thanks!

vinser52 · 2025-08-22T13:42:56Z

sycl/include/sycl/queue.hpp

+        typename TransformUserItemType<Dims, LambdaArgType>::type>;
+
+    KRInfo.HostKernel().reset(
+        new detail::HostKernel<KernelType, TransformedArgType, Dims>(


Why do we need to allocate it on heap?

Currently the scheduler takes a shared_ptr type argument for the HostKernel, and stores it until the kernel is actually submitted. Do you think we should rather pass the object by value to the scheduler?

slawekptak had a problem deploying to WindowsCILock July 3, 2025 15:23 — with GitHub Actions Failure

slawekptak temporarily deployed to WindowsCILock July 3, 2025 15:51 — with GitHub Actions Inactive

slawekptak added 2 commits July 3, 2025 16:17

[SYCL] Handler-less kernel submit API

3223842

Fix formatting

fde19ca

slawekptak had a problem deploying to WindowsCILock July 4, 2025 08:17 — with GitHub Actions Failure

slawekptak temporarily deployed to WindowsCILock July 4, 2025 08:42 — with GitHub Actions Inactive

Fix formatting

13424de

vinser52 reviewed Jul 4, 2025

View reviewed changes

sycl/include/sycl/queue.hpp Outdated Show resolved Hide resolved

vinser52 reviewed Jul 4, 2025

View reviewed changes

Alexandr-Konovalov reviewed Jul 4, 2025

View reviewed changes

sycl/include/sycl/queue.hpp Outdated Show resolved Hide resolved

slawekptak had a problem deploying to WindowsCILock July 7, 2025 11:49 — with GitHub Actions Failure

slawekptak requested review from uditagarwal97 and Pennycook July 7, 2025 11:57

slawekptak temporarily deployed to WindowsCILock July 7, 2025 12:22 — with GitHub Actions Inactive

Change the ExtendedSubmissionInfo to KernelRuntimeInfo,

fbc789d

expose the new APIs as public under a new define

vinser52 reviewed Jul 7, 2025

View reviewed changes

sycl/include/sycl/queue.hpp Outdated Show resolved Hide resolved

vinser52 reviewed Jul 7, 2025

View reviewed changes

sycl/include/sycl/queue.hpp Show resolved Hide resolved

Pennycook reviewed Jul 8, 2025

View reviewed changes

sycl/include/sycl/queue.hpp Show resolved Hide resolved

slawekptak added 4 commits July 8, 2025 13:41

Added copy/move constructor and assignment operator

591b3ec

Merge branch 'sycl' into no_handler_lib_entry

d235b7c

Add a no event submit and no handler compile flag

6641601

Merge branch 'sycl' into no_handler_lib_entry

0f41d5a

slawekptak temporarily deployed to WindowsCILock July 14, 2025 12:03 — with GitHub Actions Inactive

slawekptak temporarily deployed to WindowsCILock July 14, 2025 12:33 — with GitHub Actions Inactive

slawekptak temporarily deployed to WindowsCILock August 13, 2025 10:07 — with GitHub Actions Inactive

slawekptak added 2 commits August 13, 2025 10:18

Fix formatting

ac1a5cf

Fixed #ifdef, added comment to a new function.

5865f3a

vinser52 reviewed Aug 13, 2025

View reviewed changes

aelovikov-intel reviewed Aug 13, 2025

View reviewed changes

Merge branch 'sycl' into no_handler_lib_entry

072803c

aelovikov-intel reviewed Aug 19, 2025

View reviewed changes

Alexandr-Konovalov reviewed Aug 20, 2025

View reviewed changes

sycl/include/sycl/khr/free_function_commands.hpp Outdated Show resolved Hide resolved

Alexandr-Konovalov reviewed Aug 20, 2025

View reviewed changes

slawekptak added 2 commits August 20, 2025 15:13

Address review comments

27b3110

Updated Linux symbols

9041e94

slawekptak had a problem deploying to WindowsCILock August 21, 2025 08:54 — with GitHub Actions Failure

slawekptak had a problem deploying to WindowsCILock August 21, 2025 10:25 — with GitHub Actions Failure

slawekptak had a problem deploying to WindowsCILock August 21, 2025 11:21 — with GitHub Actions Failure

slawekptak added 2 commits August 21, 2025 11:23

Addressed more review comments

ac2c5bb

Fix formatting

8e155fb

slawekptak had a problem deploying to WindowsCILock August 21, 2025 12:19 — with GitHub Actions Failure

Fix formatting, remove unused properties argument

502f637

slawekptak had a problem deploying to WindowsCILock August 21, 2025 12:52 — with GitHub Actions Failure

Fix ProcessKernelRuntimeInfo call

d708c93

slawekptak temporarily deployed to WindowsCILock August 21, 2025 13:21 — with GitHub Actions Inactive

slawekptak added 2 commits August 21, 2025 13:51

Fix unit test build and ProcessKernelRuntimeInfo calls

e9f6e4e

Fix formatting

057a7a5

slawekptak had a problem deploying to WindowsCILock August 22, 2025 07:14 — with GitHub Actions Failure

slawekptak temporarily deployed to WindowsCILock August 22, 2025 07:41 — with GitHub Actions Inactive

vinser52 reviewed Aug 22, 2025

View reviewed changes

		int &KernelNumArgs() { return MKernelNumArgs; }
		const int &KernelNumArgs() const { return MKernelNumArgs; }

Handler-less kernel submit API #19294

Are you sure you want to change the base?

Handler-less kernel submit API #19294

Conversation

slawekptak commented Jul 3, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vinser52 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

slawekptak commented Jul 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aelovikov-intel Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aelovikov-intel left a comment

Choose a reason for hiding this comment

Uh oh!

aelovikov-intel Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

aelovikov-intel Aug 13, 2025 •

edited

Loading

aelovikov-intel Aug 13, 2025 •

edited

Loading