Skip to content

replace kernel implementation using CK tile-programming performant kernels #33

@carlushuang

Description

@carlushuang

We are planning to replace the underneath kernel implementation with the newly developed CK tile-programming fmha kernel. The performance is much better for MI200/MI300, especially for MI300 cases. After this is done, the current implementation in main branch will be deprecated.

  • fwd integration with hdim=64/128, support mask, varlen, different kernels for padding case.
  • fwd extend to other hdims
  • dropout support
  • bwd integration (to be planed)

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions