You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that when there are too many kernels/ commands queued ahead by the CPU for the GPU, the clr triggers a synchronization/flush.
That causes that the CPU can't enqueue any work ahead further, losing some performance as it might take a while for the CPU to be ahead enough so that the GPU gets swamped enough with work.
It seems like it would be better if the following was done: if more than MAX_AHEAD (e.g 1024) kernels/commands are enqueued, then wait on the CPU until having only MID_AHEAD (e.g. 512) kernels pending execution.
Having two distinct thresholds instead of a single one avoids getting in the case where every time we flush partially we enqueue just one more then flush again next time waiting for a single kernel to have been executed in the meantime.
Best regards,
Epliz
Operating System
Oracle linux 8.10
CPU
Intel xeon 8480+
GPU
8x MI300X
ROCm Version
ROCm 6.2 (packaged with pytorch)
ROCm Component
No response
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered:
Problem Description
Hi,
It seems that when there are too many kernels/ commands queued ahead by the CPU for the GPU, the clr triggers a synchronization/flush.
That causes that the CPU can't enqueue any work ahead further, losing some performance as it might take a while for the CPU to be ahead enough so that the GPU gets swamped enough with work.
It seems like it would be better if the following was done: if more than MAX_AHEAD (e.g 1024) kernels/commands are enqueued, then wait on the CPU until having only MID_AHEAD (e.g. 512) kernels pending execution.
Having two distinct thresholds instead of a single one avoids getting in the case where every time we flush partially we enqueue just one more then flush again next time waiting for a single kernel to have been executed in the meantime.
Best regards,
Epliz
Operating System
Oracle linux 8.10
CPU
Intel xeon 8480+
GPU
8x MI300X
ROCm Version
ROCm 6.2 (packaged with pytorch)
ROCm Component
No response
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: