-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flash_attention_2 2.7.2.post1 seems to crash when using torch.compile
and DataCollatorWithFlattening
#35588
Comments
Maybe @muellerzr @SunMarc? Feel free to ping someone else if you think they're more appropriate |
For the issue with qwen2, it is solved with #35187. |
Hi @SunMarc this seems like code that reproduces the issue. I think it is not directly related to
|
Thanks for the reproducer, I'll check soon ! cc @MekkCyber if you have a bit of time to look at this ! |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
@SunMarc still relevant. |
Hey! Yeah that's pretty important let's try to fix it! |
transformers/src/transformers/modeling_flash_attention_utils.py Lines 136 to 184 in 2ab7bdc
In |
Using |
System Info
transformers
version: 4.47.1Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
update to latest flash attention version (as the time of writing 2.7.2). this should be torch.compile compatible as described in https://github.com/Dao-AILab/flash-attention
load a model with fa2 (tested with opt and qwen)
use trainer with
DataCollatorWithFlattening
and train.this causes a crash with the following stacktrace:
the code works fine when not using compile.
the code doesn't crash when using compile but not using
DataCollatorWithFlattening
.when using compile and not using
DataCollatorWithFlattening
I am getting the following graph break with qwen2.5Expected behavior
the training shouldn't crash.
The text was updated successfully, but these errors were encountered: