From 1475bea995acfb986afbe9a136841b89be6a5d26 Mon Sep 17 00:00:00 2001
From: Ben Ashbaugh <ben.ashbaugh@intel.com>
Date: Mon, 25 Sep 2023 14:37:54 -0700
Subject: [PATCH] add draft SPIR-V and OpenCL extensions for subgroup
 requirements

Signed-off-by: Ben Ashbaugh <ben.ashbaugh@intel.com>
---
 .../cl_intel_subgroup_requirements.asciidoc   | 294 ++++++++++++++++++
 .../SPV_INTEL_subgroup_requirements.asciidoc  | 261 ++++++++++++++++
 2 files changed, 555 insertions(+)
 create mode 100644 sycl/doc/design/opencl-extensions/cl_intel_subgroup_requirements.asciidoc
 create mode 100644 sycl/doc/design/spirv-extensions/SPV_INTEL_subgroup_requirements.asciidoc

diff --git a/sycl/doc/design/opencl-extensions/cl_intel_subgroup_requirements.asciidoc b/sycl/doc/design/opencl-extensions/cl_intel_subgroup_requirements.asciidoc
new file mode 100644
index 0000000000000..798b3b367d9f2
--- /dev/null
+++ b/sycl/doc/design/opencl-extensions/cl_intel_subgroup_requirements.asciidoc
@@ -0,0 +1,294 @@
+:data-uri:
+:sectanchors:
+:icons: font
+:source-highlighter: coderay
+// TODO: try rouge?
+
+= cl_intel_subgroup_requirements
+
+// CL_DEVICE_PRIMARY_SUB_GROUP_SIZE_INTEL
+:CL_DEVICE_PRIMARY_SUB_GROUP_SIZE_INTEL: pass:q[`CL_DEVICE_<wbr>PRIMARY_<wbr>SUB_<wbr>GROUP_<wbr>SIZE_<wbr>INTEL`]
+:CL_DEVICE_PRIMARY_SUB_GROUP_SIZE_INTEL_anchor: {CL_DEVICE_PRIMARY_SUB_GROUP_SIZE_INTEL}
+
+// CL_DEVICE_SUB_GROUP_LANE_MAPPINGS_INTEL
+:CL_DEVICE_SUB_GROUP_LANE_MAPPINGS_INTEL: pass:q[`CL_DEVICE_<wbr>SUB_<wbr>GROUP_<wbr>LANE_<wbr>MAPPINGS_<wbr>INTEL`]
+:CL_DEVICE_SUB_GROUP_LANE_MAPPINGS_INTEL_anchor: {CL_DEVICE_SUB_GROUP_LANE_MAPPINGS_INTEL}
+
+// cl_device_sub_group_lane_mappings_intel
+:cl_device_sub_group_lane_mappings_intel_TYPE: pass:q[`cl_device_<wbr>sub_<wbr>group_<wbr>lane_<wbr>mappings_<wbr>intel`]
+
+// CL_DEVICE_SUB_GROUP_LANE_MAPPING_WRAP_INTEL
+:CL_DEVICE_SUB_GROUP_LANE_MAPPING_WRAP_INTEL: pass:q[`CL_DEVICE_<wbr>SUB_<wbr>GROUP_<wbr>LANE_<wbr>MAPPING_<wbr>WRAP_<wbr>INTEL`]
+:CL_DEVICE_SUB_GROUP_LANE_MAPPING_WRAP_INTEL_anchor: {CL_DEVICE_SUB_GROUP_LANE_MAPPING_WRAP_INTEL}
+
+// CL_DEVICE_SUB_GROUP_LANE_MAPPING_ROWS_INTEL
+:CL_DEVICE_SUB_GROUP_LANE_MAPPING_ROWS_INTEL: pass:q[`CL_DEVICE_<wbr>SUB_<wbr>GROUP_<wbr>LANE_<wbr>MAPPING_<wbr>ROWS_<wbr>INTEL`]
+:CL_DEVICE_SUB_GROUP_LANE_MAPPING_ROWS_INTEL_anchor: {CL_DEVICE_SUB_GROUP_LANE_MAPPING_ROWS_INTEL}
+
+
+== Name Strings
+
+`cl_intel_subgroup_requirements`
+
+== Contact
+
+Ben Ashbaugh, Intel (ben 'dot' ashbaugh 'at' intel 'dot' com)
+
+== Contributors
+
+// spell-checker: disable
+Ben Ashbaugh, Intel +
+Pekka Jääskeläinen, Intel +
+Henry Linjamäki, Intel +
+John Pennycook, Intel +
+// spell-checker: enable
+
+== Notice
+
+Copyright (c) 2023 Intel Corporation. All rights reserved.
+
+== Status
+
+Working Draft
+
+This is a preview extension specification, intended to provide early access to a
+feature for review and community feedback.
+When the feature matures, this specification may be released as a formal
+extension.
+
+Because the interfaces defined by this specification are not final and are
+subject to change they are not intended to be used by shipping software
+products.
+If you are interested in using this feature in your software product, please let
+us know!
+
+== Version
+
+Built On: {docdate} +
+Version: 0.9.3
+
+== Dependencies
+
+This extension is written against the OpenCL 3.0 C Language specification and
+the OpenCL SPIR-V Environment specification, V3.0.14.
+
+This extension requires OpenCL 1.0.
+
+This extension does not require any other extensions, though it is intended to
+complement
+https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_required_subgroup_size.html[cl_intel_required_subgroup_size].
+
+== Overview
+
+This extension adds the ability to query additional properties that describe how
+devices implement sub-groups and to add specific sub-group requirements to
+OpenCL kernels.
+These requirements enable programmers to reason better about how sub-groups
+behave for a kernel executing on a device.
+
+== New API Functions
+
+None.
+
+== New API Enums
+
+Accepted as the _param_name_ parameter of *clGetDeviceInfo* to query additional
+sub-group properties of an OpenCL device:
+
+[source]
+----
+CL_DEVICE_PRIMARY_SUB_GROUP_SIZE_INTEL          0x425C
+CL_DEVICE_SUB_GROUP_LANE_MAPPINGS_INTEL         0x425D
+----
+
+Bitfield type and bits describing the sub-group lane mappings supported by an
+OpenCL device:
+
+[source]
+----
+typedef cl_bitfield cl_device_sub_group_lane_mappings_intel;
+
+#define CL_DEVICE_SUB_GROUP_LANE_MAPPING_WRAP_INTEL             (1 << 0)
+#define CL_DEVICE_SUB_GROUP_LANE_MAPPING_ROWS_INTEL             (1 << 1)
+----
+
+Accepted as the _param_name_ parameter of *clGetKernelSubGroupInfo* and/or
+*clGetKernelSubGroupInfoKHR*:
+
+[source]
+----
+// TODO: do we need any per-kernel and per-device queries?
+// Probably not for a named sub-group size.
+// Possibly for the sub-group lane mapping?
+----
+
+== New API Types
+
+None.
+
+== New OpenCL C Optional Attribute Qualifiers
+
+Optional `+__kernel+` qualifiers:
+
+[source]
+----
+__attribute__((intel_reqd_named_sub_group_size("primary")))
+__attribute__((intel_reqd_sub_group_lane_mapping("wrap")))
+__attribute__((intel_reqd_sub_group_lane_mapping("rows")))
+
+// or?
+// __attribute__((intel_reqd_sub_group_size_primary))
+// __attribute__((intel_reqd_sub_group_lane_mapping_wrap))
+// __attribute__((intel_reqd_sub_group_lane_mapping_rows"))
+----
+
+=== Additions to Chapter 4 of the OpenCL 3.0 API Specification
+
+Add to Table 5 - OpenCL Device Queries:
+
+[caption="Table 5. "]
+.List of supported param_names by *clGetDeviceInfo*
+[width="100%",cols="<33%,<17%,<50%",options="header"]
+|====
+| Device Info | Return Type | Description
+
+| {CL_DEVICE_PRIMARY_SUB_GROUP_SIZE_INTEL_anchor}
+  | `size_t`
+    | Returns the primary sub-group size for the device.
+      The primary sub-group size is a sub-group size that supports all core
+      language features for the device.
+
+| {CL_DEVICE_SUB_GROUP_LANE_MAPPINGS_INTEL_anchor}
+  | {cl_device_sub_group_lane_mappings_intel_TYPE}
+    | Returns the supported sub-group lane mappings for the device.
+      The sub-group lane mappings are encoded as bits in a bitfield.
+      Supported sub-group lane mappings are:
+
+      {CL_DEVICE_SUB_GROUP_LANE_MAPPING_WRAP_INTEL_anchor}:
+      Work-items are assigned to sub-groups in a linear order, such that the
+      work-item's sub-group local ID is equal to its local work-group linear ID
+      modulo the maximum sub-group size.
+
+      {CL_DEVICE_SUB_GROUP_LANE_MAPPING_ROWS_INTEL_anchor}:
+      Work-items are assigned to sub-groups in a linear order along the first
+      dimension of the work-group, adding partial sub-groups if the first
+      dimension of the work-group is not evenly divisible by the maximum
+      sub-group size.
+      With this mapping, the work-item's sub-group local ID is equal to the
+      first dimension of its local ID modulo the maximum sub-group size.
+
+      Note, for any of these mappings, if the first dimension of the work-group
+      size is divisible by the maximum sub-group size, then all sub-groups in
+      the work-group will be the same size (there will be no partial
+      sub-groups), and all work-items in the sub-group will have linear local
+      work-group IDs.
+|====
+
+== Modifications to the OpenCL C Specification
+
+=== Add to Section 6.9.2 - Optional Attribute Qualifiers
+
+The optional `+__attribute__((intel_reqd_named_sub_group_size(<string>)))+` can
+be used to indicate that the kernel must be compiled and executed with the
+specified named sub-group size.
+When the required named sub-group size is `"primary"`
+`get_max_sub_group_size()` must return the primary sub-group size (the value
+returned for {CL_DEVICE_PRIMARY_SUB_GROUP_SIZE_INTEL}) for the device executing
+the kernel.
+
+The optional `+__attribute__((intel_reqd_sub_group_lane_mapping(<string>)))+`
+can be used to indicate that the kernel must be compiled and executed with the
+specified mapping from work-items in a work-group to sub-groups.
+When the required sub-group lane mapping is `"wrap"` the work-items in a
+work-group must be assigned to sub-groups as described by
+{CL_DEVICE_SUB_GROUP_LANE_MAPPING_WRAP_INTEL}.
+When the required sub-group lane mapping is `"rows"` the work-items in a
+work-group must be assigned to sub-groups as described by
+{CL_DEVICE_SUB_GROUP_LANE_MAPPING_ROWS_INTEL}.
+
+These attributes are important for the correctness of many sub-group algorithms,
+and in some cases may be used by the compiler to generate more optimal code.
+
+== Modifications to the OpenCL SPIR-V Environment Specification
+
+=== Add a new section 5.2.X - `cl_intel_subgroup_requirements`
+
+If the OpenCL environment supports the extension
+`cl_intel_subgroup_requirements` then the environment must accept modules that
+declare use of the extension `SPV_INTEL_subgroup_requirements` and that declare
+the SPIR-V capability *SubgroupRequirementsINTEL*.
+
+When the *NamedSubgroupSizeINTEL* execution mode added by the extension is
+*PrimarySubgroupSizeINTEL*, any variables decorated with the *SubgroupMaxSize*
+*BuiltIn* must be equal to the value returned by
+{CL_DEVICE_PRIMARY_SUB_GROUP_SIZE_INTEL}.
+
+Valid values for the *SubgroupLaneMappingINTEL* execution mode added by the
+extension are:
+
+  * *WrapINTEL* if the device supports the
+  {CL_DEVICE_SUB_GROUP_LANE_MAPPING_WRAP_INTEL} sub-group lane mapping.
+  * *RowsINTEL* if the device supports the
+  {CL_DEVICE_SUB_GROUP_LANE_MAPPING_ROWS_INTEL} sub-group lane mapping.
+
+== Issues
+
+. Should we define new OpenCL C kernel attributes?
++
+--
+*RESOLVED*:
+Yes.
+Defining new OpenCL C attributes makes it easier to test this extension and
+is consistent with the required work-group size and required sub-group size
+attributes, even if they are not required for CUDA/HIP and SYCL use-cases, or
+any other high-level languages that produce SPIR-V directly.
+--
+
+. Do we need to define new per-kernel API queries for these sub-group
+requirements?
++
+--
+*UNRESOLVED*:
+Adding new queries would help some types of profiling tools and would be
+consistent with existing per-kernel API queries for some other required
+sub-group size attributes.
+--
+
+. What should happen if a kernel requires both a named sub-group size and an
+integer sub-group size?
++
+--
+*UNRESOLVED*:
+It seems like this could be diagnosed as an error?
+--
+
+. Should we also support a symbolic "primary" lane mapping?
++
+--
+*UNRESOLVED*:
+This would provide some known sub-group lane mapping, even if it differed from
+device-to-device, without requiring a specific lane mapping that may not be
+supported by all devices.
+--
+
+== Revision History
+
+[cols="5,15,15,70"]
+[grid="rows"]
+[options="header"]
+|========================================
+|Version|Date|Author|Changes
+|0.9.0|2023-04-21|Ben Ashbaugh|*Initial internal revision*
+|0.9.1|2023-07-10|Ben Ashbaugh|Fix bug in calculations to use the maximum sub-group size, not the sub-group size.
+|0.9.2|2023-07-11|Ben Ashbaugh|Incorporated review feedback.
+|0.9.3|2023-09-22|Ben Ashbaugh|Assigned enums, final edits before public preview.
+|========================================
+
+//************************************************************************
+//Other formatting suggestions:
+//
+//* Use *bold* text for host APIs, or [source] syntax highlighting.
+//* Use `mono` text for device APIs, or [source] syntax highlighting.
+//* Use `mono` text for extension names, types, or enum values.
+//* Use _italics_ for parameters.
+//************************************************************************
diff --git a/sycl/doc/design/spirv-extensions/SPV_INTEL_subgroup_requirements.asciidoc b/sycl/doc/design/spirv-extensions/SPV_INTEL_subgroup_requirements.asciidoc
new file mode 100644
index 0000000000000..d7bef7f740ac0
--- /dev/null
+++ b/sycl/doc/design/spirv-extensions/SPV_INTEL_subgroup_requirements.asciidoc
@@ -0,0 +1,261 @@
+= SPV_INTEL_subgroup_requirements
+
+== Name Strings
+
+SPV_INTEL_subgroup_requirements
+
+== Contact
+
+To report problems with this extension, please open a new issue at:
+
+https://github.com/KhronosGroup/SPIRV-Registry
+
+== Contributors
+
+// spell-checker: disable
+* Ben Ashbaugh, Intel
+* Pekka Jääskeläinen, Intel
+* Henry Linjamäki, Intel
+* John Pennycook, Intel
+// spell-checker: enable
+
+== Notice
+
+Copyright (c) 2023 Intel Corporation.  All rights reserved.
+
+== Status
+
+* Working Draft
+
+This is a preview extension specification, intended to provide early access to a
+feature for review and community feedback.
+When the feature matures, this specification may be released as a formal
+extension.
+
+Because the interfaces defined by this specification are not final and are
+subject to change they are not intended to be used by shipping software
+products.
+If you are interested in using this feature in your software product, please let
+us know!
+
+== Version
+
+[width="40%",cols="25,25"]
+|========================================
+| Last Modified Date | {docdate}
+| Revision           | C
+|========================================
+
+== Dependencies
+
+This extension is written against the SPIR-V Specification, Version 1.6 Revision 2.
+
+This extension requires SPIR-V 1.0.
+
+== Overview
+
+This extension allows a compiler generating SPIR-V to add additional
+subgroup requirements to entry points (kernels) in a SPIR-V module.
+These requirements enable programmers and compilers to reason better about how
+subgroups behave for an entry point.
+
+Specifically, this extension adds two new execution modes:
+
+* The ability to specify a symbolic *NamedSubgroupSizeINTEL* subgroup size for
+an entry point.
+* The ability to specify the *SubgroupLaneMappingINTEL* for an entry point.
+The subgroup lane mapping defines how the invocations (work-items) in a
+workgroup are assigned to subgroups.
+
+== Extension Name
+
+To use this extension within a SPIR-V module, the following *OpExtension* must
+be present in the module:
+
+----
+OpExtension "SPV_INTEL_subgroup_requirements"
+----
+
+== Modifications to the SPIR-V Specification, Version 1.6
+
+=== Validation Rules
+
+Add validation rules to section 2.16.1 Universal Validation Rules under Entry Point:
+
+* Each *OpEntryPoint* must contain at most one of the *SubgroupSize* or
+*NamedSubgroupSizeINTEL* execution modes.
+* Each *OpEntryPoint* must contain at most one of the *SubgroupLaneMappingINTEL*
+execution modes.
+
+=== Execution Mode
+
+Modify Section 3.6, "Execution Mode", adding these rows to the Execution Mode table:
+
+--
+[cols="^4,20,3*5,22",options="header"]
+|=====
+2+^.^| Execution Mode 3+<.^| Extra Operands | Enabling Capabilities
+| 6446 | *NamedSubgroupSizeINTEL*
+   3+| _Named Subgroup Size_ | *SubgroupRequirementsINTEL*
+| 6447 | *SubgroupLaneMappingINTEL*
+   3+| _Subgroup Lane Mapping_ | *SubgroupRequirementsINTEL*
+|=====
+--
+
+=== Capabilities
+
+Modify Section 3.31, "Capability", adding these rows to the Capability table:
+
+--
+[cols="^.^2,16,15",options="header"]
+|====
+2+^.^| Capability | Implicitly Declares
+| 6445 | *SubgroupRequirementsINTEL* +
+Indicates that additional named subgroup size or subgroup lane mapping
+requirements may be specified.
+|
+|====
+--
+
+=== Named Subgroup Size
+
+Add a new Section 3.XX, "Named Subgroup Size":
+
+Specify a required subgroup size by name.
+A named subgroup size is a symbolic subgroup size with specific properties.
+If an entry point does not specify a specific *SubgroupSize* or
+*NamedSubgroupSizeINTEL* then the subgroup size for the entry point is
+implementation-defined and may vary per entry point or per execution of the
+entry point.  See the client API specification for more detail.
+
+--
+[cols="^.^4,16,15",options="header"]
+|====
+2+^.^| Named Subgroup Size | Enabling Capabilities
+| 0 | *PrimarySubgroupSizeINTEL* +
+A subgroup size that must support all of the core language features for the
+device.
+The primary subgroup size may differ from device-to-device, but will be the same
+for all kernels compiled for a specific device.
+   | *SubgroupRequirementsINTEL*
+|====
+--
+
+=== Subgroup Lane Mapping
+
+Add a new Section 3.XX, "Subgroup Lane Mapping":
+
+Specify a required subgroup lane mapping.
+The subgroup lane mapping defines how the invocations in a workgroup
+(work-items in a work-group) are assigned to subgroups.
+If an entry point does not specify a specific *SubgroupLaneMappingINTEL* then
+the subgroup lane mapping is implementation-defined.
+
+--
+[cols="^.^4,16,15",options="header"]
+|====
+2+^.^| Subgroup Lane Mapping | Enabling Capabilities
+| 0 | *WrapINTEL* +
+Assign invocations in a workgroup to subgroups in a linear order, such that the
+*SubgroupLocalInvocationId* is equal to the *LocalInvocationIndex* modulo the
+*SubgroupMaxSize* (or, that `get_sub_group_local_id()` equals
+`get_local_linear_id() % get_max_sub_group_size()`)
+   | *SubgroupRequirementsINTEL*
+| 1 | *RowsINTEL* +
+Assign invocations in a workgroup to subgroups in a linear order along rows of
+the workgroup (the inner-most dimension, typically the first component of the
+*LocalInvocationId*), adding inactive invocations (padding) if the row size is
+not evenly divisible by the *SubgroupMaxSize*.
+The inactive invocations are not accessible or included in any invocation
+indexing.
+With this mapping, the *SubgroupLocalInvocationId* is equal to the inner-most
+dimension of the workgroup local ID modulo the *SubgroupMaxSize* (or, that
+`get_sub_group_local_id()` equals `get_local_id(0) % get_max_sub_group_size()`).
+   | *SubgroupRequirementsINTEL*
+|====
+--
+
+== Issues
+
+. What should this extension be called?
++
+--
+*UNRESOLVED*: The current name is `SPV_INTEL_subgroup_requirements`, which seems
+general-purpose but not too general-purpose.
+--
+
+. Should we allow the named subgroup size or the subgroup lane mapping to be
+specified with a specialization constant?
++
+--
+*RESOLVED*: No, this is not required.
+Note, there is no way to specify the existing *SubgroupSize* execution mode
+with a specialization constant.
+--
+
+. Are the subgroup requirements added by this extension new execution modes
+affecting the entry point and any called functions or do they need to be more
+fine-grained?
++
+--
+*RESOLVED*: They are new execution modes only.
+This aligns with the existing *SubgroupSize* execution mode.
+It also avoids complicated mix-and-match cases where a function or kernel with
+one subgroup requirement calls another function with a different subgroup
+requirement.
+--
+
+. Do we also need to define an explicit *AutomaticSubgroupSize*, or is the
+subgroup size implicitly automatically determined when the *NamedSubgroupSize*
+and *SubgroupSize* execution modes are absent?
++
+--
+*UNRESOLVED*: The proposed
+https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_named_sub_group_sizes.asciidoc[SYCL
+extension] has an explicit named automatic subgroup size property.
+It is not clear that this is needed in SPIR-V.
+It is not included in the current version of this extension because current
+device compilers already treat the subgroup size as implementation-defined in
+the absence of any specific subgroup size requirements.
+--
+
+. Do we need an explicit *SubgroupLaneMappingINTEL* to request any lane mapping
+such that a) all subgroups have the same size and b) all invocations in the
+subgroup have linear *LocalInvocationIDs*, so long as the inner-most dimension
+of the workgroup is divisible by the subgroup size?
++
+--
+*UNRESOLVED*: This is a desirable subgroup requirement.
+It is sufficient to reason about many subgroup algorithms even though it is a
+weaker requirement than the specific *WrapINTEL* and *RowsINTEL* subgroup lane
+mappings.
+
+As an alternative, if we do not add an explicit *SubgroupLaneMappingINTEL*
+requirement, we could add this requirement to a client API requirement
+specification supporting this SPIR-V extension.
+
+Note, this property is satisfied by both the *WrapINTEL* and *RowsINTEL*
+subgroup plane mappings.
+--
+
+. Do we need to define a mapping for the *SubgroupId* also?
+If we do define a mapping for the *SubgroupId* is it defined based on the
+*SubgroupLaneMappingINTEL* also or is it something different?
++
+--
+*UNRESOLVED*: There are some "obvious" mappings but this is another case where
+the specs do not define any specific mapping.
+--
+
+== Revision History
+
+[cols="5,15,15,70"]
+[grid="rows"]
+[options="header"]
+|========================================
+|Rev|Date|Author|Changes
+|A|2023-04-24|Ben Ashbaugh|*Initial internal revision*
+|B|2023-07-10|Ben Ashbaugh|Fix bug where some cases of *SubgroupSize* should have been *SubgroupMaxSize*.
+|C|2023-07-11|Ben Ashbaugh|Incorporated review feedback.
+|D|2023-09-22|Ben Ashbaugh|Assigned enums, final edits before public preview.
+|========================================