[QST] Utilizing both Tensor Cores and Cuda Cores, Possible to overlay GEMM calls? #2117

zzhou292 · 2025-02-17T22:53:50Z

What is your question?
Short but dumb questions:

If I understand things correctly, when launching gemm with OpClassTensorOp, cuda cores are idling; if launching gemm with OpClassSimt, Tensor cores are idling. So, is it possible to overlay gemm launch like cudastream in ordinary cuda programming to utilize both the compute unit?

In simple words, concurrently, can we launch a gemm with OpClassSimt and another gemm with OpClassTensorOp (they work on different matrices), and execute both of them at the same time using Tensor cores and CUDA cores?

Thanks in advance for your reply!

thakkarV · 2025-02-18T02:17:24Z

For many complex architectural reasons, no not really

zzhou292 · 2025-02-18T05:13:07Z

Thanks for the reply @thakkarV .

One quick follow-up, is this concurrent execution on Cuda cores and Tensor cores not possible for cutlass for now or is it generally speaking not possible? (can we explicitly program a kernel to use wmma instructions to achieve this?)

Thanks again!!!

thakkarV · 2025-02-18T13:10:27Z

This is not a CUTLASS limitation. you can in theory write a CUTLASS kernel that does both. It just does not make sense to issue SIMT FMAs while also issuing tensor core MMAs. We of course issue other types of SIMT instructions and interleave them with tensor cores, but for the purposes of your question of using FMA and MMA at the same time, there is not much of a point in pursuing in that within or outside CUTLASS

zzhou292 added ? - Needs Triage question Question labels Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] Utilizing both Tensor Cores and Cuda Cores, Possible to overlay GEMM calls? #2117

[QST] Utilizing both Tensor Cores and Cuda Cores, Possible to overlay GEMM calls? #2117

zzhou292 commented Feb 17, 2025

thakkarV commented Feb 18, 2025

zzhou292 commented Feb 18, 2025

thakkarV commented Feb 18, 2025

[QST] Utilizing both Tensor Cores and Cuda Cores, Possible to overlay GEMM calls? #2117

[QST] Utilizing both Tensor Cores and Cuda Cores, Possible to overlay GEMM calls? #2117

Comments

zzhou292 commented Feb 17, 2025

thakkarV commented Feb 18, 2025

zzhou292 commented Feb 18, 2025

thakkarV commented Feb 18, 2025