-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add torchcompile_xentropy executor #1655
base: main
Are you sure you want to change the base?
Conversation
This executor is meant for fusions with a |
@IvanYashchuk Adding cross entropy does indeed not help with rope but it allows thunder to use a very efficient fused cross entropy triton kernel which perf currently cannot be matched with the other executors(not even apex). From lines 211 and 212, it does seem that while the set of ops was originally though for rope, it is suggested that other ops could have been added in the future making this a fusion executor that comes in rescue of nvFuser when it can improve performance. Do you think that it would be better to create another executor entry for inductor cross entropy only? |
… torchcompilecat-add-xentropy
This reverts commit 6c5b0e1.
for more information, see https://pre-commit.ci
Reducing the models size to speedup CI and make it work in environments with constrained memory size. I think this change should be fine since with this test we are verifying functionality more than memory footprint. Let me know what you think. With this change the peak memory of the test is ~2.6GB instead of ~14GB
I have a hard time seeing another torch.compile executor as the solution here. |
I'd piggyback on this explaination on the differences between lightning-thunder/thunder/executors/torch_compile.py Lines 209 to 215 in 880419d
An issue I see is that we don't strictly enforce this rule, so in case where the user specifies |
What does this PR do?
In an attempt to fix #1654 and partially #1552, this PR adds the necessary ops in the
torchcompile_cat
list to capture HF CausalLMLoss.Before merging this PR needs testing with other models, I will write a message with the results of the benchmarks.