Skip to content

Conversation

kvark
Copy link
Member

@kvark kvark commented Sep 20, 2025

Connections
Blocked by gfx-rs/rspirv#265
Since rspirv fails validation of the product, even though it's correct.

Description
Adding shader support for KHR_cooperative_matrix. Considering a rather simple scope that is portable between Vulkan and Metal.

Testing
Adds tests.

Squash or Rebase?
Rebase.

Checklist

  • Run cargo fmt.
  • Run taplo format.
  • Run cargo clippy --tests. If applicable, add:
    • --target wasm32-unknown-unknown
  • Run cargo xtask test to run tests.
  • If this contains user-facing changes, add a CHANGELOG.md entry.

API choices

SPIRV and Metal have a fine intersection of the cooperative matrix functionality, with some caveats:

  • GLSL calls it "coopmat" while Metal has "simdgroup_typeNxY". I decided to go with "coop_mat" since WGSL fairly consistently separates sub-words with an underscore, e.g. "texture_cube".
  • SPIRV requires a "use" to be associated with each matrix type. It's one of A/B/Acc. Metal doesn't. The API decision here is to expose it as a "role" being one of the generic parameters of coop_mat.
  • SPIRV has OpCooperativeMatrixLoadKHR and OpCooperativeMatrixMulAddKHR as expressions and OpCooperativeMatrixStoreKHR as a statement. Metal has all of them 3 as statements. I followed SPIR-V notion here, as does Google's proposal.
    • the "T" suffix is for transposed load/store. No strong opinion here.
  • Metal also has just the multiplication (as opposed to multiply-add). I opted to not expose this, since we can always follow-up if needed.

Things left for follow-up:

  • update the API based on whatever the W3C working group converges on
  • maybe add the multiply without add
  • implement initialization from a scalar (honestly not sure how useful this is?)
  • support for coop matrix with scalar binary ops is very limited
  • could use more validation and better errors

@kvark kvark force-pushed the cooperative branch 5 times, most recently from 881da16 to 430d104 Compare September 26, 2025 03:30
@kvark kvark marked this pull request as ready for review September 26, 2025 03:30
@cwfitzgerald
Copy link
Member

Haven't actually looked in the PR yet, but you should take a look at the presentation about cooperative matrices from the F2F: https://docs.google.com/presentation/d/1wiy3-ar58ah1W9Qc5trd0gG7fwCo93IJ9YCtQoR6W6c/edit?slide=id.g30fc39156ff_0_0#slide=id.g30fc39156ff_0_0 and the dawn design doc https://dawn.googlesource.com/dawn/+/refs/heads/main/docs/dawn/features/subgroup_matrix.md just to make sure things are synced up with upstream.

@kvark
Copy link
Member Author

kvark commented Sep 27, 2025

@cwfitzgerald this is very useful, thanks for linking! Funny to see the timing of that presentation roughly matching when I started working on it, independently. I looked at the slides as well as the design doc, and here is my first feedback. Apologies if it's not thought through enough!

Because the type is abstract it can only be stored in the Function and Private storage classes. Special load and store instructions are used to translate to/from backing memory.

There are very similar types - textures and sampler - which also are very abstract from the shader writer point of view. Was it considered to just use the "Handle" storage class?

subgroup_matrix_left

There is a choice for each of them: scope, role (left/right/acc), type, etc, to be either a generic argument or a part of the name. In this PR, for example, the role is encoded as a generic A/B/C. I think that makes sense because it allows to express operations like matrix store cleanly as generic instead of overloaded for all kinds of the matrix.

Similarly, the "subgroup" part. If we had it as a generic scope, it could also use it in other parts of the language/API (e.g. barriers).

subgroupMatrixLoad(.. col_major : bool, ..) -> T

A boolean argument is generally a bad API pattern, since the call site has no clue about what it means from just looking at the invocation. Since this is supposed to be a constant anyway, maybe this is a good application for including this into the function name itself? This PR is currently exposing it as coopMatrixLoad/coopMatrixLoadT (the "T" suffix - for transposed).

Overall, looks reasonable. Curious if Apple had concerns about some parts as well.
cc @jimblandy if you want to expose this feedback to the group.

@kvark
Copy link
Member Author

kvark commented Sep 28, 2025

@cwfitzgerald @jimblandy do you have a strong preference on how to proceed with the changes? I'm at the point where things basically work, and the test is validating correctly. We could:

  1. land as is and then change the names (and a bit of semantics) once the WGSL figures out the standard API for this. I'm fairly confident that most of the IR and inner logic isn't going to be affected.
  2. rewrite this to match Google's proposal text, if the working group is leaning towards that style of API (see my remarks in the comment above).
  3. don't land anything until WGSL is figured out by the group

I'm fine either way. I just want to use this for a project and will be on a branch if I'm not able to merge. My preference would be (1).

@kvark kvark force-pushed the cooperative branch 3 times, most recently from 2bf7828 to 782a0fc Compare September 28, 2025 04:06
@kvark
Copy link
Member Author

kvark commented Sep 28, 2025

Ok, I've got coopLoad aligned to the same API as the WGSL proposal. It's a bit strange since it's only the second function we support that even has generic arguments. But the code changes to support this are pretty small, fortunately.
CI should be green ✅ now . Looking forward to get some feedback and/or proceed 🚀 .

@kvark kvark requested a review from jimblandy October 1, 2025 04:20
@cwfitzgerald cwfitzgerald self-assigned this Oct 1, 2025
@jimblandy
Copy link
Member

I think it's our standard practice to land experimental things, so I think it's okay for us to review and land this as-is. However, the WebGPU committee will almost certainly approve some version of Alan's proposal, eventually, so if we put something different in wgpu, it will just need to be changed.

So, I'd like to really encourage you to adapt what you've got to Alan's proposal as much as feasible, but we shouldn't block merging on 100% compliance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants