Fix unpacking issue caused by newer Flash Attention #289

Stillerman · 2025-03-05T12:42:14Z

What does this PR do?

For newer flash attention versions (looks like 2.7.0 and onward) the bert_padding.unpad_input function returns an additional value, so the example llama training from README throws ValueError: too many values to unpack.

Fixes #251

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guidelines?
Did you write any new necessary tests?
Did you log the throughput and loss you get to ensure the PR works as expected in actual training?
Did you log the memory usage? you can use this tool to understand the memory usage breakdown in nanotron.
If you modified anything related to checkpoints, did you verify that saving and reloading checkpoints still works correctly?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

NouamaneTazi

try to make it backward compatible!

Stillerman · 2025-03-05T17:10:23Z

We can unpack unknown number of values with *_! Tested with

python -m torch.distributed.run --nproc_per_node=1 run_generate.py --ckpt-path checkpoints/10/ --tp 1 --pp 1

Tested on flash_attn-2.6.3 and flash-attn-2.7.4.post1 and both are working. 2.7.4.post1 does not work on main right now.

fix unpacking

d99b840

Stillerman changed the title ~~Fix unpacking issue caused by newer Flash Attention~~ [WIP] Fix unpacking issue caused by newer Flash Attention Mar 5, 2025

unformat

3258360

Stillerman changed the title ~~[WIP] Fix unpacking issue caused by newer Flash Attention~~ Fix unpacking issue caused by newer Flash Attention Mar 5, 2025

NouamaneTazi reviewed Mar 5, 2025

View reviewed changes

Stillerman added 2 commits March 5, 2025 17:06

backwards compat

421d205

undo contraint bump in readme

8a15d81

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix unpacking issue caused by newer Flash Attention #289

Fix unpacking issue caused by newer Flash Attention #289

Uh oh!

Stillerman commented Mar 5, 2025

Uh oh!

NouamaneTazi left a comment

Uh oh!

Stillerman commented Mar 5, 2025

Uh oh!

Uh oh!

Fix unpacking issue caused by newer Flash Attention #289

Are you sure you want to change the base?

Fix unpacking issue caused by newer Flash Attention #289

Uh oh!

Conversation

Stillerman commented Mar 5, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

NouamaneTazi left a comment

Choose a reason for hiding this comment

Uh oh!

Stillerman commented Mar 5, 2025

Uh oh!

Uh oh!