Open
Description
The SPIR-V OpSubgroupShuffleUpINTEL
(and OpSubgroupShuffleDownINTEL
) has more functionality than required to implement the SYCL shuffles, which leads to unnecessary complexity.
It would be interesting to see the potential performance hit and to see how it could be optimized.
At the moment this is used for the HIP target, for NVidia the NVidia built-ins are directly used instead of the SPIR-V operation, doing the same thing for AMD may also be beneficial for performance.
It is unclear if this would have a significant impact but it should be investigated.
This was discussed on: