-
Notifications
You must be signed in to change notification settings - Fork 797
[SYCL] Implement max_num_work_groups from the launch queries extension #14333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
aead3e3
e172c1e
a840be1
a9e17b4
4f81d0a
b2756c9
28b09f4
fd51cfb
377cf3b
3a8f3bf
b5b3d43
594727e
efcf44f
448b191
54c1b57
eb60b1c
b4b355e
20aa7c5
8fa7b09
e5910fa
a7411c8
7de06c1
e50e837
dc2dde4
b7807f9
4344064
0ef4baa
2fa280b
da8cde2
27b2416
b51f965
ef4cd8b
6483da7
4fc7353
7636f78
ea1e525
1698bb8
529caa5
2e190a2
762c7e1
2169579
c2788f6
6c5485f
cb5e47e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,154 @@ | ||
= sycl_ext_oneapi_group_occupancy_queries | ||
|
||
:source-highlighter: coderay | ||
:coderay-linenums-mode: table | ||
|
||
// This section needs to be after the document title. | ||
:doctype: book | ||
:toc2: | ||
:toc: left | ||
:encoding: utf-8 | ||
:lang: en | ||
:dpcpp: pass:[DPC++] | ||
|
||
// Set the default source code type in this document to C++, | ||
// for syntax highlighting purposes. This is needed because | ||
// docbook uses c++ and html5 uses cpp. | ||
:language: {basebackend@docbook:c++:cpp} | ||
|
||
|
||
== Notice | ||
|
||
[%hardbreaks] | ||
Copyright (C) 2024 Intel Corporation. All rights reserved. | ||
|
||
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks | ||
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by | ||
permission by Khronos. | ||
|
||
|
||
== Contact | ||
|
||
To report problems with this extension, please open a new issue at: | ||
|
||
https://github.com/intel/llvm/issues | ||
|
||
|
||
== Dependencies | ||
|
||
This extension is written against the SYCL 2020 revision 5 specification. All | ||
references below to the "core SYCL specification" or to section numbers in the | ||
SYCL specification refer to that revision. | ||
|
||
This extension also depends on the following other SYCL extensions: | ||
|
||
* link:../proposed/sycl_ext_oneapi_launch_queries.asciidoc[ | ||
sycl_ext_oneapi_launch_queries] | ||
|
||
|
||
== Status | ||
|
||
This is an experimental extension specification, intended to provide early | ||
access to features and gather community feedback. Interfaces defined in this | ||
specification are implemented in {dpcpp}, but they are not finalized and may | ||
change incompatibly in future versions of {dpcpp} without prior notice. | ||
*Shipping software products should not rely on APIs defined in this | ||
specification.* | ||
|
||
|
||
== Overview | ||
|
||
This extension is based on the kernel-queue-specific specific querying mechanism | ||
GeorgeWeb marked this conversation as resolved.
Show resolved
Hide resolved
|
||
introduced by the sycl_ext_oneapi_launch_queries extension. | ||
|
||
The purpose of queries the to be added is to aid occupancy based calculations | ||
for kernel launches based on hardware occupancy per compute unit granularity. | ||
The queries take in account the kernel resources and user-specified constraints, | ||
such as, but not limited to, local (work-group) size and dynamic work-group | ||
local memory (in bytes). The motivation behind is to aid the tuning of kernels, | ||
by being able to design the algorithm's implementation to maintain the highest | ||
possible occupancy in a portable way. | ||
|
||
List of currently planned queries. | ||
* max_num_work_group_occupancy_per_cu | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't render well in HTML. Asciidoc requires a blank line before the bullet. Are you planning to add more queries to this extension soon? This list of currently planned queries seems odd. I'd suggest removing it unless you have some specific plan to add more things. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was planning on at least one more, being Also, this is not much of a list with one addition at this point, so I am removing it. |
||
|
||
[source,c++] | ||
---- | ||
sycl::queue q{}; | ||
auto bundle = sycl::get_kernel_bundle(q.get_context()); | ||
auto kernel = bundle.get_kernel<class KernelName>(); | ||
|
||
auto wgSizeRange = sycl::range{32, 1, 1}; | ||
size_t localMemorySize = 32; | ||
|
||
namespace syclex = sycl::ext::oneapi::experimental; | ||
uint32_t maxWGsPerCU = kernel.ext_oneapi_get_info< | ||
syclex::info::kernel_queue_specific::max_num_work_group_occupancy_per_cu>( | ||
GeorgeWeb marked this conversation as resolved.
Show resolved
Hide resolved
|
||
q, wgSizeRange, localMemorySize); | ||
---- | ||
|
||
NOTE: SYCL 2020 requires lambdas to be named in order to locate the associated | ||
`sycl::kernel` object used to query information descriptors. Reducing the | ||
verbosity of the queries shown above is left to a future extension. | ||
|
||
|
||
== Specification | ||
|
||
=== Feature test macro | ||
|
||
This extension provides a feature-test macro as described in the core SYCL | ||
specification. An implementation supporting this extension must predefine the | ||
macro `SYCL_EXT_ONEAPI_GROUP_OCCUPANCY_QUERIES` to one of the values defined in | ||
the table below. Applications can test for the existence of this macro to | ||
determine if the implementation supports this feature, or applications can test | ||
the macro's value to determine which of the extension's features the | ||
implementation supports. | ||
|
||
[%header,cols="1,5"] | ||
|=== | ||
|Value | ||
|Description | ||
|
||
|1 | ||
|The APIs of this experimental extension are not versioned, so the | ||
feature-test macro always has this value. | ||
|=== | ||
|
||
|
||
=== Occupancy queries | ||
|
||
[source, c++] | ||
---- | ||
namespace ext::oneapi::experimental::info::kernel { | ||
|
||
struct max_num_work_group_occupancy_per_cu; | ||
|
||
} | ||
---- | ||
|
||
[%header,cols="1,5,5,5"] | ||
|=== | ||
|Kernel Descriptor | ||
|Argument Types | ||
|Return Type | ||
|Description | ||
|
||
|`max_num_work_group_occupancy_per_cu` | ||
|`sycl::queue`, `sycl::range`, `size_t` | ||
|`uint32_t` | ||
GeorgeWeb marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|Returns the maximum number of actively executing work-groups per compute unit | ||
granularity, when the kernel is submitted to the specified queue with the | ||
specified work-group size and the specified amount of dynamic work-group local | ||
memory (in bytes). The actively executing work-groups are those that occupy | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be good to be a little more detailed about what counts as dynamic work-group local memory. I assume this is the sum of the sizes of all local accessors, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Dynamically allocated (SYCL) local memory, for which the size is know only at runtime and can change between kernel submissions, so yes, you are right. I will elaborate in a little more detail to make it clear. |
||
the fundamental hardware unit responsible for the execution of work-groups in | ||
parallel. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the idea that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is a recommendation for the maximum number of work-groups (or Cuda blocks) of specified block size etc. that will theoretically execute concurrently on the compute unit (Cuda SM) to achieve maximum occupancy. I think I like the naming you suggest and it does sound to me like a recommendation. What do you think would be a good name based on my description? (I am sold based on the fact the original I came up with sounds a little weird.) Thank you, @gmlueck ! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I renamed it to |
||
|
||
|=== | ||
|
||
== Implementation notes | ||
|
||
The implementation needs to define `sycl::kernel::ext_onapi_get_info` with the | ||
extra `sycl::range` and `size_t` parameters in addition to the `sycl::queue`. | ||
|
||
The Cuda, Hip and Level Zero backend adapters have the required infrastructure | ||
required to implement the extension. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
__SYCL_PARAM_TRAITS_SPEC(ext::oneapi::experimental, kernel_queue_specific, max_num_work_group_occupancy_per_cu, uint32_t,) | ||
__SYCL_PARAM_TRAITS_SPEC(ext::oneapi::experimental, kernel_queue_specific, max_num_work_group_sync, size_t,) |
Uh oh!
There was an error while loading. Please reload this page.