You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When benchmarking kernels during the Quick tune (exhaustive as well), the algorithm is to take the average of 10 runs per tried Kernel and then compare to the other configs. The Winning Kernel config is the one with the best average time.
The complete 10 runs are not recorded. The goal here is to capture all the times runs and print the min, max, and median.
Lastly add a capability to change the picking algorithm from Average to .... Min, or Median
The complete 10 runs are not recorded. The goal here is to capture all the times runs and print the min, max, and median.
Lastly add a capability to change the picking algorithm from Average to .... Min, or Median
This is not possible. We dont time each run on purpose because we want to minimize the launch overhead when we are benchmarking so we can get closer to the actual device time.
When benchmarking kernels during the Quick tune (exhaustive as well), the algorithm is to take the average of 10 runs per tried Kernel and then compare to the other configs. The Winning Kernel config is the one with the best average time.
The complete 10 runs are not recorded. The goal here is to capture all the times runs and print the min, max, and median.
Lastly add a capability to change the picking algorithm from Average to .... Min, or Median
As an example what we see today...
MIGRAPHX_TRACE_BENCHMARKING=2 MIGRAPHX_TRACE_MLIR=2
Problem: gfx1150 12 -t f16 -out_datatype f16 -transA false -transB true -g 1 -m 1 -n 4096 -k 4096
Benchmarking solution: v2:16,256,4,16,64,4,1,1,1 => ((16256) / (1664)) * 32 = 128
2.6971ms
What we would like to see...
Problem: gfx1150 12 -t f16 -out_datatype f16 -transA false -transB true -g 1 -m 1 -n 4096 -k 4096
Benchmarking solution: v2:16,256,4,16,64,4,1,1,1 => ((16256) / (1664)) * 32 = 128
2.6971ms, 2.0 min, 23.9 max, 2.50 med
The text was updated successfully, but these errors were encountered: