You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
--cudagraph runs triton.testing.do_bench_cudagraph. So these two should give the same latency. However, this is not true.
To repro, please check out #345 which updates the input settings.
python3 run.py --op rope --cudagraph gives 0.0020 for liger and 0.0027 for inductor.
HOWEVER, python repro.py gives 0.0017 for lilger and 0.0014 for inductor. repro.py copies code from tritonbench/operators/rope/operator.py and uses triton.testing.do_bench_cudagraph for benchmark.