Skip to content

Investigate performance when using OpSubgroupShuffleUpINTEL for SYCL shuffles #5364

Open
@npmiller

Description

@npmiller

The SPIR-V OpSubgroupShuffleUpINTEL (and OpSubgroupShuffleDownINTEL) has more functionality than required to implement the SYCL shuffles, which leads to unnecessary complexity.

It would be interesting to see the potential performance hit and to see how it could be optimized.

At the moment this is used for the HIP target, for NVidia the NVidia built-ins are directly used instead of the SPIR-V operation, doing the same thing for AMD may also be beneficial for performance.

It is unclear if this would have a significant impact but it should be investigated.

This was discussed on:

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthipIssues related to execution on HIP backend.performancePerformance related issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions