Add support for Flex Attention #1675

ShashankMosaicML · 2024-11-27T00:39:51Z

There are 4 TODOs regarding compiled flex attention that needed to be investigated before checking in. See the tests for more details. TL;DR:

I think sequence lengths which are not multiples of 128 are still not supported properly (https://pytorch.org/blog/flexattention/#limitations-and-future-work)
Left padding has issues, which can cause issues during generation and inference (maybe related to this because i was seeing the same error: [Flex Attention] Cannot determine truth value of Relational pytorch/pytorch#139064)
Changing number of heads between tests for alibi causes errors. This is potentially a minor issue since during actual training or inference, we don't change the number of heads.

Summary: Potentially safe to use for training, not for inference.

Needs this fix for compiling sequence id dependent block masking: pytorch/pytorch#136427, which is is in torch nightly. Use this command to install torch nightly: pip3 install torch==2.6.0.dev20241126+cu124 torchvision==0.20.0.dev20241126+cu124 torchaudio==2.5.0.dev20241126+cu124 --index-url https://download.pytorch.org/whl/nightly/cu124

ShashankMosaicML and others added 30 commits November 18, 2024 13:47

adding flex attention

f416539

registrifying score mods

ac3a884

registrifying attention mask mods

31b27e2

Merge branch 'mosaicml:main' into shashank/flexattention

c8fffa5

bug_fix

86dce3b

bug_fix

cb8f4a6

lint

902850a

configuring test

9c9708d

configuring tests

f1ff430

bug fix

e537f5a

fixing alibi

c527dd7

Merge branch 'mosaicml:main' into shashank/flexattention

15e303e

configuring further tests

c4ef5d9

refactoring

6b37427

adding warnings and errors

e30fe7a

gating tests on torch version

924a53c

Merge branch 'mosaicml:main' into shashank/flexattention

57048e3

reorganizing function defs

67a2aea

refactoring

04f3a62

passing in dicts of mask and score mods

ab6c58c

making mask and score mods configurable via yaml

3b3827d

Merge branch 'mosaicml:main' into shashank/flexattention

be43e8d

adding torch.compile

2264f91

..

e274d9f

..

a26bb4f

undoing comment out

d5ab7d3

Merge branch 'mosaicml:main' into shashank/flexattention

d40e978

adding torch comile

5f13e7b

temporary commit commenting out block mask and score mod

ca8e173

undoing prev temp commit

f5486ff

ShashankMosaicML and others added 30 commits December 4, 2024 15:48

..

8dfdedb

fixing score mod bug

8912cb2

..

18c4bb9

..

bf1cb6c

..

5093efd

..

96b8f82

..

f1ad991

..

18afcc5

configuring with torch 2.5.1 and 2.6.0.dev

434aa83

configuring more tests with torch 2.5.1 and 2.6.0.dev

216fcb9

..

438e0f3

..

2bb25ee

..

9831b5e

..

ad601e4

..

77115c5

figuring out d_model and seq lengths for which flex attention works

dfde51b

adding todos

d1d04ce

Merge branch 'main' into shashank/flexattention

5eca05f

adding test for local global attention

718d89d

Merge branch 'main' into shashank/flexattention

135abd7

Merge branch 'main' into shashank/flexattention

369e818

Merge branch 'main' into shashank/flexattention

8a62ca4

Merge branch 'main' into shashank/flexattention

5d67b9c

Merge branch 'main' into shashank/flexattention

05cf043

..

e221c32

..

4bc6f7c

..

45fc516

..

70f928a

Merge branch 'main' into shashank/flexattention

397ca38

Merge branch 'main' into shashank/flexattention

8f276d0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for Flex Attention #1675

Add support for Flex Attention #1675

Uh oh!

ShashankMosaicML commented Nov 27, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add support for Flex Attention #1675

Are you sure you want to change the base?

Add support for Flex Attention #1675

Uh oh!

Conversation

ShashankMosaicML commented Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ShashankMosaicML commented Nov 27, 2024 •

edited

Loading