-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compiled model run in v100 GPU is slower #2008
Comments
Different Also, |
i tried to run it twice but it was still slower. is there any suggestion to debug this? (e.g., giving particular option in compile, adding option to make trace or verbose outputs and so on). |
@stephen-youn Thanks for trying out Script
Output
Please let me know if you have any other questions. Please feel free to close the bug if your question is answered. |
yes i also modified the code similarly and got a perf gain in v100 too. |
Reading between the lines, it seems you are interested in
|
I tried "opt_model = torch.compile(model, passes={'triton-mm': "triton", 'triton-bmm': True})" |
🐛 Describe the bug
Hi,
I tried bert and resnet examples in the tutorial https://pytorch.org/blog/Accelerating-Hugging-Face-and-TIMM-models/
but it ran slower with the "torch.compile" with v100 under unbuntu env i have (i.e., Linux GCRHYP3C148 4.15.0-193-generic #204-Ubuntu SMP)
isn't it supposed to be faster?
thanks
Error logs
No response
Minified repro
"""
resnet
"""
this runs like the following and the compiled model run 74x slower as shown below
it's similar for the following bert example in the tutorial. it's 14.7x slower with the extra line "model = torch.compile(model)"
The text was updated successfully, but these errors were encountered: