|
| 1 | += sycl_ext_oneapi_group_occupancy_queries |
| 2 | + |
| 3 | +:source-highlighter: coderay |
| 4 | +:coderay-linenums-mode: table |
| 5 | + |
| 6 | +// This section needs to be after the document title. |
| 7 | +:doctype: book |
| 8 | +:toc2: |
| 9 | +:toc: left |
| 10 | +:encoding: utf-8 |
| 11 | +:lang: en |
| 12 | +:dpcpp: pass:[DPC++] |
| 13 | + |
| 14 | +// Set the default source code type in this document to C++, |
| 15 | +// for syntax highlighting purposes. This is needed because |
| 16 | +// docbook uses c++ and html5 uses cpp. |
| 17 | +:language: {basebackend@docbook:c++:cpp} |
| 18 | + |
| 19 | + |
| 20 | +== Notice |
| 21 | + |
| 22 | +[%hardbreaks] |
| 23 | +Copyright (C) 2024 Intel Corporation. All rights reserved. |
| 24 | + |
| 25 | +Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks |
| 26 | +of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by |
| 27 | +permission by Khronos. |
| 28 | + |
| 29 | + |
| 30 | +== Contact |
| 31 | + |
| 32 | +To report problems with this extension, please open a new issue at: |
| 33 | + |
| 34 | +https://github.com/intel/llvm/issues |
| 35 | + |
| 36 | + |
| 37 | +== Dependencies |
| 38 | + |
| 39 | +This extension is written against the SYCL 2020 revision 5 specification. All |
| 40 | +references below to the "core SYCL specification" or to section numbers in the |
| 41 | +SYCL specification refer to that revision. |
| 42 | + |
| 43 | +This extension also depends on the following other SYCL extensions: |
| 44 | + |
| 45 | +* link:../proposed/sycl_ext_oneapi_launch_queries.asciidoc[ |
| 46 | + sycl_ext_oneapi_launch_queries] |
| 47 | + |
| 48 | + |
| 49 | +== Status |
| 50 | + |
| 51 | +This is an experimental extension specification, intended to provide early |
| 52 | +access to features and gather community feedback. Interfaces defined in this |
| 53 | +specification are implemented in {dpcpp}, but they are not finalized and may |
| 54 | +change incompatibly in future versions of {dpcpp} without prior notice. |
| 55 | +*Shipping software products should not rely on APIs defined in this |
| 56 | +specification.* |
| 57 | + |
| 58 | + |
| 59 | +== Overview |
| 60 | + |
| 61 | +This extension is based on the kernel-queue-specific specific querying mechanism |
| 62 | +introduced by the sycl_ext_oneapi_launch_queries extension. |
| 63 | + |
| 64 | +The purpose of queries the to be added is to aid occupancy based calculations |
| 65 | +for kernel launches based on hardware occupancy per compute unit granularity. |
| 66 | +The queries take in account the kernel resources and user-specified constraints, |
| 67 | +such as, but not limited to, local (work-group) size and dynamic work-group |
| 68 | +local memory (in bytes). The motivation behind is to aid the tuning of kernels, |
| 69 | +by being able to design the algorithm's implementation to maintain the highest |
| 70 | +possible occupancy in a portable way. |
| 71 | + |
| 72 | +List of currently planned queries. |
| 73 | +* max_num_work_group_occupancy_per_cu |
| 74 | + |
| 75 | +[source,c++] |
| 76 | +---- |
| 77 | +sycl::queue q{}; |
| 78 | +auto bundle = sycl::get_kernel_bundle(q.get_context()); |
| 79 | +auto kernel = bundle.get_kernel<class KernelName>(); |
| 80 | +
|
| 81 | +auto wgSizeRange = sycl::range{32, 1, 1}; |
| 82 | +size_t localMemorySize = 32; |
| 83 | +
|
| 84 | +namespace syclex = sycl::ext::oneapi::experimental; |
| 85 | +uint32_t maxWGsPerCU = kernel.ext_oneapi_get_info< |
| 86 | + syclex::info::kernel_queue_specific::max_num_work_group_occupancy_per_cu>( |
| 87 | + q, wgSizeRange, localMemorySize); |
| 88 | +---- |
| 89 | + |
| 90 | +NOTE: SYCL 2020 requires lambdas to be named in order to locate the associated |
| 91 | +`sycl::kernel` object used to query information descriptors. Reducing the |
| 92 | +verbosity of the queries shown above is left to a future extension. |
| 93 | + |
| 94 | + |
| 95 | +== Specification |
| 96 | + |
| 97 | +=== Feature test macro |
| 98 | + |
| 99 | +This extension provides a feature-test macro as described in the core SYCL |
| 100 | +specification. An implementation supporting this extension must predefine the |
| 101 | +macro `SYCL_EXT_ONEAPI_GROUP_OCCUPANCY_QUERIES` to one of the values defined in |
| 102 | +the table below. Applications can test for the existence of this macro to |
| 103 | +determine if the implementation supports this feature, or applications can test |
| 104 | +the macro's value to determine which of the extension's features the |
| 105 | +implementation supports. |
| 106 | + |
| 107 | +[%header,cols="1,5"] |
| 108 | +|=== |
| 109 | +|Value |
| 110 | +|Description |
| 111 | + |
| 112 | +|1 |
| 113 | +|The APIs of this experimental extension are not versioned, so the |
| 114 | + feature-test macro always has this value. |
| 115 | +|=== |
| 116 | + |
| 117 | + |
| 118 | +=== Occupancy queries |
| 119 | + |
| 120 | +[source, c++] |
| 121 | +---- |
| 122 | +namespace ext::oneapi::experimental::info::kernel { |
| 123 | +
|
| 124 | +struct max_num_work_group_occupancy_per_cu; |
| 125 | +
|
| 126 | +} |
| 127 | +---- |
| 128 | + |
| 129 | +[%header,cols="1,5,5,5"] |
| 130 | +|=== |
| 131 | +|Kernel Descriptor |
| 132 | +|Argument Types |
| 133 | +|Return Type |
| 134 | +|Description |
| 135 | + |
| 136 | +|`max_num_work_group_occupancy_per_cu` |
| 137 | +|`sycl::queue`, `sycl::range`, `size_t` |
| 138 | +|`uint32_t` |
| 139 | +|Returns the maximum number of actively executing work-groups per compute unit |
| 140 | +granularity, when the kernel is submitted to the specified queue with the |
| 141 | +specified work-group size and the specified amount of dynamic work-group local |
| 142 | +memory (in bytes). The actively executing work-groups are those that occupy |
| 143 | +the fundamental hardware unit responsible for the execution of work-groups in |
| 144 | +parallel. |
| 145 | + |
| 146 | +|=== |
| 147 | + |
| 148 | +== Implementation notes |
| 149 | + |
| 150 | +The implementation needs to define `sycl::kernel::ext_onapi_get_info` with the |
| 151 | +extra `sycl::range` and `size_t` parameters in addition to the `sycl::queue`. |
| 152 | + |
| 153 | +The Cuda, Hip and Level Zero backend adapters have the required infrastructure |
| 154 | +required to implement the extension. |
0 commit comments