You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I understand things correctly, when launching gemm with OpClassTensorOp, cuda cores are idling; if launching gemm with OpClassSimt, Tensor cores are idling. So, is it possible to overlay gemm launch like cudastream in ordinary cuda programming to utilize both the compute unit?
In simple words, concurrently, can we launch a gemm with OpClassSimt and another gemm with OpClassTensorOp (they work on different matrices), and execute both of them at the same time using Tensor cores and CUDA cores?
Thanks in advance for your reply!
The text was updated successfully, but these errors were encountered:
One quick follow-up, is this concurrent execution on Cuda cores and Tensor cores not possible for cutlass for now or is it generally speaking not possible? (can we explicitly program a kernel to use wmma instructions to achieve this?)
This is not a CUTLASS limitation. you can in theory write a CUTLASS kernel that does both. It just does not make sense to issue SIMT FMAs while also issuing tensor core MMAs. We of course issue other types of SIMT instructions and interleave them with tensor cores, but for the purposes of your question of using FMA and MMA at the same time, there is not much of a point in pursuing in that within or outside CUTLASS
What is your question?
Short but dumb questions:
If I understand things correctly, when launching gemm with OpClassTensorOp, cuda cores are idling; if launching gemm with OpClassSimt, Tensor cores are idling. So, is it possible to overlay gemm launch like cudastream in ordinary cuda programming to utilize both the compute unit?
In simple words, concurrently, can we launch a gemm with OpClassSimt and another gemm with OpClassTensorOp (they work on different matrices), and execute both of them at the same time using Tensor cores and CUDA cores?
Thanks in advance for your reply!
The text was updated successfully, but these errors were encountered: