Use WMMA when possible #26

pxl-th · 2026-01-02T16:49:08Z

For now CUDA.jl only. AMDGPU.jl will come next.

FP16:

before:

Flash attention FWD + BWD:
  114.087 ms (1142 allocations: 33.73 KiB)

now:

Flash attention FWD + BWD:
  83.020 ms (1142 allocations: 33.73 KiB)

pxl-th · 2026-01-02T17:22:43Z

Locally tests pass, CI is broken.

AntonOresten · 2026-01-02T18:11:01Z

Wow!! Terrific! Excellent work🙏

Would BFloat16 require JuliaGPU/CUDA.jl#1425?

pxl-th · 2026-01-02T18:28:35Z

Would BFloat16 require JuliaGPU/CUDA.jl#1425?

Most likely, I've tried it without those and it gave a bunch of errors.

AntonOresten · 2026-01-03T02:06:35Z

JuliaGPU/CUDA.jl#3009 could be promising. Note that it requires Float32 accumulation.

Comparing ONIONop (no WMMA) to NNop master (with WMMA) with some buffer and type conversion to work around the accumulation:

pxl-th added 2 commits January 2, 2026 18:46

Use WMMA when possible

e879c87

Cleanup

c346d5a

pxl-th merged commit b6ea23c into master Jan 2, 2026
1 check failed

pxl-th deleted the pxl-th/wmma branch January 2, 2026 17:22

AntonOresten mentioned this pull request Jan 2, 2026

Utilizing tensor cores for mma! #17

Open

AntonOresten mentioned this pull request Jan 3, 2026

Add BFloat16 WMMA JuliaGPU/CUDA.jl#3009

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Use WMMA when possible #26

Use WMMA when possible #26

Uh oh!

pxl-th commented Jan 2, 2026

Uh oh!

pxl-th commented Jan 2, 2026

Uh oh!

Uh oh!

AntonOresten commented Jan 2, 2026

Uh oh!

pxl-th commented Jan 2, 2026

Uh oh!

AntonOresten commented Jan 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Use WMMA when possible #26

Use WMMA when possible #26

Uh oh!

Conversation

pxl-th commented Jan 2, 2026

Uh oh!

pxl-th commented Jan 2, 2026

Uh oh!

Uh oh!

AntonOresten commented Jan 2, 2026

Uh oh!

pxl-th commented Jan 2, 2026

Uh oh!

AntonOresten commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AntonOresten commented Jan 3, 2026 •

edited

Loading