You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As mentioned briefly at #104 , I believe that adding support for cache flushing control flags to the stream memory read/write operations could be great, similarly to how it is possible with events.
Thanks to stream memory operations I have been able to implement an all-reduce single machine implementation that has ~18us latency, which is really good already (better than rccl from what I see). I believe that being able to avoid cache flushes might help shave more on that.
Best,
Epliz
Operating System
No response
GPU
No response
ROCm Component
No response
The text was updated successfully, but these errors were encountered:
In CUDA land, I believe that what I am requesting corresponds to supporting the flags CU_STREAM_WAIT_VALUE_FLUSH and CU_STREAM_WRITE_VALUE_NO_MEMORY_BARRIER
Suggestion Description
Hi,
As mentioned briefly at #104 , I believe that adding support for cache flushing control flags to the stream memory read/write operations could be great, similarly to how it is possible with events.
Thanks to stream memory operations I have been able to implement an all-reduce single machine implementation that has ~18us latency, which is really good already (better than rccl from what I see). I believe that being able to avoid cache flushes might help shave more on that.
Best,
Epliz
Operating System
No response
GPU
No response
ROCm Component
No response
The text was updated successfully, but these errors were encountered: