[Issue]: flush happening when too many kernels are queued ahead could be improved #136

Epliz · 2025-02-11T08:05:29Z

Problem Description

Hi,

It seems that when there are too many kernels/ commands queued ahead by the CPU for the GPU, the clr triggers a synchronization/flush.
That causes that the CPU can't enqueue any work ahead further, losing some performance as it might take a while for the CPU to be ahead enough so that the GPU gets swamped enough with work.

It seems like it would be better if the following was done: if more than MAX_AHEAD (e.g 1024) kernels/commands are enqueued, then wait on the CPU until having only MID_AHEAD (e.g. 512) kernels pending execution.
Having two distinct thresholds instead of a single one avoids getting in the case where every time we flush partially we enqueue just one more then flush again next time waiting for a single kernel to have been executed in the meantime.

Best regards,
Epliz

Operating System

Oracle linux 8.10

CPU

Intel xeon 8480+

GPU

8x MI300X

ROCm Version

ROCm 6.2 (packaged with pytorch)

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

ppanchad-amd · 2025-02-11T15:16:30Z

Hi @Epliz. Internal ticket has been created to investigate this issue. Thanks!

schung-amd · 2025-02-11T18:42:28Z

Hi @Epliz, is there a reproducer you can provide for this?

ppanchad-amd added the Under Investigation label Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue]: flush happening when too many kernels are queued ahead could be improved #136

[Issue]: flush happening when too many kernels are queued ahead could be improved #136

Epliz commented Feb 11, 2025

ppanchad-amd commented Feb 11, 2025

schung-amd commented Feb 11, 2025

[Issue]: flush happening when too many kernels are queued ahead could be improved #136

[Issue]: flush happening when too many kernels are queued ahead could be improved #136

Comments

Epliz commented Feb 11, 2025

Problem Description

Operating System

CPU

GPU

ROCm Version

ROCm Component

Steps to Reproduce

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

Additional Information

ppanchad-amd commented Feb 11, 2025

schung-amd commented Feb 11, 2025