diff --git a/sycl/doc/design/opencl-extensions/cl_intel_subgroup_requirements.asciidoc b/sycl/doc/design/opencl-extensions/cl_intel_subgroup_requirements.asciidoc new file mode 100644 index 0000000000000..798b3b367d9f2 --- /dev/null +++ b/sycl/doc/design/opencl-extensions/cl_intel_subgroup_requirements.asciidoc @@ -0,0 +1,294 @@ +:data-uri: +:sectanchors: +:icons: font +:source-highlighter: coderay +// TODO: try rouge? + += cl_intel_subgroup_requirements + +// CL_DEVICE_PRIMARY_SUB_GROUP_SIZE_INTEL +:CL_DEVICE_PRIMARY_SUB_GROUP_SIZE_INTEL: pass:q[`CL_DEVICE_PRIMARY_SUB_GROUP_SIZE_INTEL`] +:CL_DEVICE_PRIMARY_SUB_GROUP_SIZE_INTEL_anchor: {CL_DEVICE_PRIMARY_SUB_GROUP_SIZE_INTEL} + +// CL_DEVICE_SUB_GROUP_LANE_MAPPINGS_INTEL +:CL_DEVICE_SUB_GROUP_LANE_MAPPINGS_INTEL: pass:q[`CL_DEVICE_SUB_GROUP_LANE_MAPPINGS_INTEL`] +:CL_DEVICE_SUB_GROUP_LANE_MAPPINGS_INTEL_anchor: {CL_DEVICE_SUB_GROUP_LANE_MAPPINGS_INTEL} + +// cl_device_sub_group_lane_mappings_intel +:cl_device_sub_group_lane_mappings_intel_TYPE: pass:q[`cl_device_sub_group_lane_mappings_intel`] + +// CL_DEVICE_SUB_GROUP_LANE_MAPPING_WRAP_INTEL +:CL_DEVICE_SUB_GROUP_LANE_MAPPING_WRAP_INTEL: pass:q[`CL_DEVICE_SUB_GROUP_LANE_MAPPING_WRAP_INTEL`] +:CL_DEVICE_SUB_GROUP_LANE_MAPPING_WRAP_INTEL_anchor: {CL_DEVICE_SUB_GROUP_LANE_MAPPING_WRAP_INTEL} + +// CL_DEVICE_SUB_GROUP_LANE_MAPPING_ROWS_INTEL +:CL_DEVICE_SUB_GROUP_LANE_MAPPING_ROWS_INTEL: pass:q[`CL_DEVICE_SUB_GROUP_LANE_MAPPING_ROWS_INTEL`] +:CL_DEVICE_SUB_GROUP_LANE_MAPPING_ROWS_INTEL_anchor: {CL_DEVICE_SUB_GROUP_LANE_MAPPING_ROWS_INTEL} + + +== Name Strings + +`cl_intel_subgroup_requirements` + +== Contact + +Ben Ashbaugh, Intel (ben 'dot' ashbaugh 'at' intel 'dot' com) + +== Contributors + +// spell-checker: disable +Ben Ashbaugh, Intel + +Pekka Jääskeläinen, Intel + +Henry Linjamäki, Intel + +John Pennycook, Intel + +// spell-checker: enable + +== Notice + +Copyright (c) 2023 Intel Corporation. All rights reserved. + +== Status + +Working Draft + +This is a preview extension specification, intended to provide early access to a +feature for review and community feedback. +When the feature matures, this specification may be released as a formal +extension. + +Because the interfaces defined by this specification are not final and are +subject to change they are not intended to be used by shipping software +products. +If you are interested in using this feature in your software product, please let +us know! + +== Version + +Built On: {docdate} + +Version: 0.9.3 + +== Dependencies + +This extension is written against the OpenCL 3.0 C Language specification and +the OpenCL SPIR-V Environment specification, V3.0.14. + +This extension requires OpenCL 1.0. + +This extension does not require any other extensions, though it is intended to +complement +https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_required_subgroup_size.html[cl_intel_required_subgroup_size]. + +== Overview + +This extension adds the ability to query additional properties that describe how +devices implement sub-groups and to add specific sub-group requirements to +OpenCL kernels. +These requirements enable programmers to reason better about how sub-groups +behave for a kernel executing on a device. + +== New API Functions + +None. + +== New API Enums + +Accepted as the _param_name_ parameter of *clGetDeviceInfo* to query additional +sub-group properties of an OpenCL device: + +[source] +---- +CL_DEVICE_PRIMARY_SUB_GROUP_SIZE_INTEL 0x425C +CL_DEVICE_SUB_GROUP_LANE_MAPPINGS_INTEL 0x425D +---- + +Bitfield type and bits describing the sub-group lane mappings supported by an +OpenCL device: + +[source] +---- +typedef cl_bitfield cl_device_sub_group_lane_mappings_intel; + +#define CL_DEVICE_SUB_GROUP_LANE_MAPPING_WRAP_INTEL (1 << 0) +#define CL_DEVICE_SUB_GROUP_LANE_MAPPING_ROWS_INTEL (1 << 1) +---- + +Accepted as the _param_name_ parameter of *clGetKernelSubGroupInfo* and/or +*clGetKernelSubGroupInfoKHR*: + +[source] +---- +// TODO: do we need any per-kernel and per-device queries? +// Probably not for a named sub-group size. +// Possibly for the sub-group lane mapping? +---- + +== New API Types + +None. + +== New OpenCL C Optional Attribute Qualifiers + +Optional `+__kernel+` qualifiers: + +[source] +---- +__attribute__((intel_reqd_named_sub_group_size("primary"))) +__attribute__((intel_reqd_sub_group_lane_mapping("wrap"))) +__attribute__((intel_reqd_sub_group_lane_mapping("rows"))) + +// or? +// __attribute__((intel_reqd_sub_group_size_primary)) +// __attribute__((intel_reqd_sub_group_lane_mapping_wrap)) +// __attribute__((intel_reqd_sub_group_lane_mapping_rows")) +---- + +=== Additions to Chapter 4 of the OpenCL 3.0 API Specification + +Add to Table 5 - OpenCL Device Queries: + +[caption="Table 5. "] +.List of supported param_names by *clGetDeviceInfo* +[width="100%",cols="<33%,<17%,<50%",options="header"] +|==== +| Device Info | Return Type | Description + +| {CL_DEVICE_PRIMARY_SUB_GROUP_SIZE_INTEL_anchor} + | `size_t` + | Returns the primary sub-group size for the device. + The primary sub-group size is a sub-group size that supports all core + language features for the device. + +| {CL_DEVICE_SUB_GROUP_LANE_MAPPINGS_INTEL_anchor} + | {cl_device_sub_group_lane_mappings_intel_TYPE} + | Returns the supported sub-group lane mappings for the device. + The sub-group lane mappings are encoded as bits in a bitfield. + Supported sub-group lane mappings are: + + {CL_DEVICE_SUB_GROUP_LANE_MAPPING_WRAP_INTEL_anchor}: + Work-items are assigned to sub-groups in a linear order, such that the + work-item's sub-group local ID is equal to its local work-group linear ID + modulo the maximum sub-group size. + + {CL_DEVICE_SUB_GROUP_LANE_MAPPING_ROWS_INTEL_anchor}: + Work-items are assigned to sub-groups in a linear order along the first + dimension of the work-group, adding partial sub-groups if the first + dimension of the work-group is not evenly divisible by the maximum + sub-group size. + With this mapping, the work-item's sub-group local ID is equal to the + first dimension of its local ID modulo the maximum sub-group size. + + Note, for any of these mappings, if the first dimension of the work-group + size is divisible by the maximum sub-group size, then all sub-groups in + the work-group will be the same size (there will be no partial + sub-groups), and all work-items in the sub-group will have linear local + work-group IDs. +|==== + +== Modifications to the OpenCL C Specification + +=== Add to Section 6.9.2 - Optional Attribute Qualifiers + +The optional `+__attribute__((intel_reqd_named_sub_group_size()))+` can +be used to indicate that the kernel must be compiled and executed with the +specified named sub-group size. +When the required named sub-group size is `"primary"` +`get_max_sub_group_size()` must return the primary sub-group size (the value +returned for {CL_DEVICE_PRIMARY_SUB_GROUP_SIZE_INTEL}) for the device executing +the kernel. + +The optional `+__attribute__((intel_reqd_sub_group_lane_mapping()))+` +can be used to indicate that the kernel must be compiled and executed with the +specified mapping from work-items in a work-group to sub-groups. +When the required sub-group lane mapping is `"wrap"` the work-items in a +work-group must be assigned to sub-groups as described by +{CL_DEVICE_SUB_GROUP_LANE_MAPPING_WRAP_INTEL}. +When the required sub-group lane mapping is `"rows"` the work-items in a +work-group must be assigned to sub-groups as described by +{CL_DEVICE_SUB_GROUP_LANE_MAPPING_ROWS_INTEL}. + +These attributes are important for the correctness of many sub-group algorithms, +and in some cases may be used by the compiler to generate more optimal code. + +== Modifications to the OpenCL SPIR-V Environment Specification + +=== Add a new section 5.2.X - `cl_intel_subgroup_requirements` + +If the OpenCL environment supports the extension +`cl_intel_subgroup_requirements` then the environment must accept modules that +declare use of the extension `SPV_INTEL_subgroup_requirements` and that declare +the SPIR-V capability *SubgroupRequirementsINTEL*. + +When the *NamedSubgroupSizeINTEL* execution mode added by the extension is +*PrimarySubgroupSizeINTEL*, any variables decorated with the *SubgroupMaxSize* +*BuiltIn* must be equal to the value returned by +{CL_DEVICE_PRIMARY_SUB_GROUP_SIZE_INTEL}. + +Valid values for the *SubgroupLaneMappingINTEL* execution mode added by the +extension are: + + * *WrapINTEL* if the device supports the + {CL_DEVICE_SUB_GROUP_LANE_MAPPING_WRAP_INTEL} sub-group lane mapping. + * *RowsINTEL* if the device supports the + {CL_DEVICE_SUB_GROUP_LANE_MAPPING_ROWS_INTEL} sub-group lane mapping. + +== Issues + +. Should we define new OpenCL C kernel attributes? ++ +-- +*RESOLVED*: +Yes. +Defining new OpenCL C attributes makes it easier to test this extension and +is consistent with the required work-group size and required sub-group size +attributes, even if they are not required for CUDA/HIP and SYCL use-cases, or +any other high-level languages that produce SPIR-V directly. +-- + +. Do we need to define new per-kernel API queries for these sub-group +requirements? ++ +-- +*UNRESOLVED*: +Adding new queries would help some types of profiling tools and would be +consistent with existing per-kernel API queries for some other required +sub-group size attributes. +-- + +. What should happen if a kernel requires both a named sub-group size and an +integer sub-group size? ++ +-- +*UNRESOLVED*: +It seems like this could be diagnosed as an error? +-- + +. Should we also support a symbolic "primary" lane mapping? ++ +-- +*UNRESOLVED*: +This would provide some known sub-group lane mapping, even if it differed from +device-to-device, without requiring a specific lane mapping that may not be +supported by all devices. +-- + +== Revision History + +[cols="5,15,15,70"] +[grid="rows"] +[options="header"] +|======================================== +|Version|Date|Author|Changes +|0.9.0|2023-04-21|Ben Ashbaugh|*Initial internal revision* +|0.9.1|2023-07-10|Ben Ashbaugh|Fix bug in calculations to use the maximum sub-group size, not the sub-group size. +|0.9.2|2023-07-11|Ben Ashbaugh|Incorporated review feedback. +|0.9.3|2023-09-22|Ben Ashbaugh|Assigned enums, final edits before public preview. +|======================================== + +//************************************************************************ +//Other formatting suggestions: +// +//* Use *bold* text for host APIs, or [source] syntax highlighting. +//* Use `mono` text for device APIs, or [source] syntax highlighting. +//* Use `mono` text for extension names, types, or enum values. +//* Use _italics_ for parameters. +//************************************************************************ diff --git a/sycl/doc/design/spirv-extensions/SPV_INTEL_subgroup_requirements.asciidoc b/sycl/doc/design/spirv-extensions/SPV_INTEL_subgroup_requirements.asciidoc new file mode 100644 index 0000000000000..d7bef7f740ac0 --- /dev/null +++ b/sycl/doc/design/spirv-extensions/SPV_INTEL_subgroup_requirements.asciidoc @@ -0,0 +1,261 @@ += SPV_INTEL_subgroup_requirements + +== Name Strings + +SPV_INTEL_subgroup_requirements + +== Contact + +To report problems with this extension, please open a new issue at: + +https://github.com/KhronosGroup/SPIRV-Registry + +== Contributors + +// spell-checker: disable +* Ben Ashbaugh, Intel +* Pekka Jääskeläinen, Intel +* Henry Linjamäki, Intel +* John Pennycook, Intel +// spell-checker: enable + +== Notice + +Copyright (c) 2023 Intel Corporation. All rights reserved. + +== Status + +* Working Draft + +This is a preview extension specification, intended to provide early access to a +feature for review and community feedback. +When the feature matures, this specification may be released as a formal +extension. + +Because the interfaces defined by this specification are not final and are +subject to change they are not intended to be used by shipping software +products. +If you are interested in using this feature in your software product, please let +us know! + +== Version + +[width="40%",cols="25,25"] +|======================================== +| Last Modified Date | {docdate} +| Revision | C +|======================================== + +== Dependencies + +This extension is written against the SPIR-V Specification, Version 1.6 Revision 2. + +This extension requires SPIR-V 1.0. + +== Overview + +This extension allows a compiler generating SPIR-V to add additional +subgroup requirements to entry points (kernels) in a SPIR-V module. +These requirements enable programmers and compilers to reason better about how +subgroups behave for an entry point. + +Specifically, this extension adds two new execution modes: + +* The ability to specify a symbolic *NamedSubgroupSizeINTEL* subgroup size for +an entry point. +* The ability to specify the *SubgroupLaneMappingINTEL* for an entry point. +The subgroup lane mapping defines how the invocations (work-items) in a +workgroup are assigned to subgroups. + +== Extension Name + +To use this extension within a SPIR-V module, the following *OpExtension* must +be present in the module: + +---- +OpExtension "SPV_INTEL_subgroup_requirements" +---- + +== Modifications to the SPIR-V Specification, Version 1.6 + +=== Validation Rules + +Add validation rules to section 2.16.1 Universal Validation Rules under Entry Point: + +* Each *OpEntryPoint* must contain at most one of the *SubgroupSize* or +*NamedSubgroupSizeINTEL* execution modes. +* Each *OpEntryPoint* must contain at most one of the *SubgroupLaneMappingINTEL* +execution modes. + +=== Execution Mode + +Modify Section 3.6, "Execution Mode", adding these rows to the Execution Mode table: + +-- +[cols="^4,20,3*5,22",options="header"] +|===== +2+^.^| Execution Mode 3+<.^| Extra Operands | Enabling Capabilities +| 6446 | *NamedSubgroupSizeINTEL* + 3+| _Named Subgroup Size_ | *SubgroupRequirementsINTEL* +| 6447 | *SubgroupLaneMappingINTEL* + 3+| _Subgroup Lane Mapping_ | *SubgroupRequirementsINTEL* +|===== +-- + +=== Capabilities + +Modify Section 3.31, "Capability", adding these rows to the Capability table: + +-- +[cols="^.^2,16,15",options="header"] +|==== +2+^.^| Capability | Implicitly Declares +| 6445 | *SubgroupRequirementsINTEL* + +Indicates that additional named subgroup size or subgroup lane mapping +requirements may be specified. +| +|==== +-- + +=== Named Subgroup Size + +Add a new Section 3.XX, "Named Subgroup Size": + +Specify a required subgroup size by name. +A named subgroup size is a symbolic subgroup size with specific properties. +If an entry point does not specify a specific *SubgroupSize* or +*NamedSubgroupSizeINTEL* then the subgroup size for the entry point is +implementation-defined and may vary per entry point or per execution of the +entry point. See the client API specification for more detail. + +-- +[cols="^.^4,16,15",options="header"] +|==== +2+^.^| Named Subgroup Size | Enabling Capabilities +| 0 | *PrimarySubgroupSizeINTEL* + +A subgroup size that must support all of the core language features for the +device. +The primary subgroup size may differ from device-to-device, but will be the same +for all kernels compiled for a specific device. + | *SubgroupRequirementsINTEL* +|==== +-- + +=== Subgroup Lane Mapping + +Add a new Section 3.XX, "Subgroup Lane Mapping": + +Specify a required subgroup lane mapping. +The subgroup lane mapping defines how the invocations in a workgroup +(work-items in a work-group) are assigned to subgroups. +If an entry point does not specify a specific *SubgroupLaneMappingINTEL* then +the subgroup lane mapping is implementation-defined. + +-- +[cols="^.^4,16,15",options="header"] +|==== +2+^.^| Subgroup Lane Mapping | Enabling Capabilities +| 0 | *WrapINTEL* + +Assign invocations in a workgroup to subgroups in a linear order, such that the +*SubgroupLocalInvocationId* is equal to the *LocalInvocationIndex* modulo the +*SubgroupMaxSize* (or, that `get_sub_group_local_id()` equals +`get_local_linear_id() % get_max_sub_group_size()`) + | *SubgroupRequirementsINTEL* +| 1 | *RowsINTEL* + +Assign invocations in a workgroup to subgroups in a linear order along rows of +the workgroup (the inner-most dimension, typically the first component of the +*LocalInvocationId*), adding inactive invocations (padding) if the row size is +not evenly divisible by the *SubgroupMaxSize*. +The inactive invocations are not accessible or included in any invocation +indexing. +With this mapping, the *SubgroupLocalInvocationId* is equal to the inner-most +dimension of the workgroup local ID modulo the *SubgroupMaxSize* (or, that +`get_sub_group_local_id()` equals `get_local_id(0) % get_max_sub_group_size()`). + | *SubgroupRequirementsINTEL* +|==== +-- + +== Issues + +. What should this extension be called? ++ +-- +*UNRESOLVED*: The current name is `SPV_INTEL_subgroup_requirements`, which seems +general-purpose but not too general-purpose. +-- + +. Should we allow the named subgroup size or the subgroup lane mapping to be +specified with a specialization constant? ++ +-- +*RESOLVED*: No, this is not required. +Note, there is no way to specify the existing *SubgroupSize* execution mode +with a specialization constant. +-- + +. Are the subgroup requirements added by this extension new execution modes +affecting the entry point and any called functions or do they need to be more +fine-grained? ++ +-- +*RESOLVED*: They are new execution modes only. +This aligns with the existing *SubgroupSize* execution mode. +It also avoids complicated mix-and-match cases where a function or kernel with +one subgroup requirement calls another function with a different subgroup +requirement. +-- + +. Do we also need to define an explicit *AutomaticSubgroupSize*, or is the +subgroup size implicitly automatically determined when the *NamedSubgroupSize* +and *SubgroupSize* execution modes are absent? ++ +-- +*UNRESOLVED*: The proposed +https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_named_sub_group_sizes.asciidoc[SYCL +extension] has an explicit named automatic subgroup size property. +It is not clear that this is needed in SPIR-V. +It is not included in the current version of this extension because current +device compilers already treat the subgroup size as implementation-defined in +the absence of any specific subgroup size requirements. +-- + +. Do we need an explicit *SubgroupLaneMappingINTEL* to request any lane mapping +such that a) all subgroups have the same size and b) all invocations in the +subgroup have linear *LocalInvocationIDs*, so long as the inner-most dimension +of the workgroup is divisible by the subgroup size? ++ +-- +*UNRESOLVED*: This is a desirable subgroup requirement. +It is sufficient to reason about many subgroup algorithms even though it is a +weaker requirement than the specific *WrapINTEL* and *RowsINTEL* subgroup lane +mappings. + +As an alternative, if we do not add an explicit *SubgroupLaneMappingINTEL* +requirement, we could add this requirement to a client API requirement +specification supporting this SPIR-V extension. + +Note, this property is satisfied by both the *WrapINTEL* and *RowsINTEL* +subgroup plane mappings. +-- + +. Do we need to define a mapping for the *SubgroupId* also? +If we do define a mapping for the *SubgroupId* is it defined based on the +*SubgroupLaneMappingINTEL* also or is it something different? ++ +-- +*UNRESOLVED*: There are some "obvious" mappings but this is another case where +the specs do not define any specific mapping. +-- + +== Revision History + +[cols="5,15,15,70"] +[grid="rows"] +[options="header"] +|======================================== +|Rev|Date|Author|Changes +|A|2023-04-24|Ben Ashbaugh|*Initial internal revision* +|B|2023-07-10|Ben Ashbaugh|Fix bug where some cases of *SubgroupSize* should have been *SubgroupMaxSize*. +|C|2023-07-11|Ben Ashbaugh|Incorporated review feedback. +|D|2023-09-22|Ben Ashbaugh|Assigned enums, final edits before public preview. +|========================================