Skip to content

feat: make flash attention configurable #60

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 94 commits into from
Jan 30, 2025

Conversation

theissenhelen
Copy link
Collaborator

@theissenhelen theissenhelen commented Jan 6, 2025

Current setup:

  • If flash-attn is available in the environment, the MultiHeadSelfAttention module automatically imports the corresponding attention function. In inference however we do not have that information.

Now:

  • user specifies whether flash-attn, flex attention or scaled dot product attention should be used in the model config.
  • adds configurable parameters (soft cap, aLiBi) for flash attention
    for aLiB:i adds a function to compute the slopes according to the number of attention heads
  • scaled dot product attention now supports sliding window (making it numerically equivalent to flash/flex)

Todo:

  • test various attention options
  • adjust test case coverage

theissenhelen and others added 30 commits September 27, 2024 08:42
* fix: change pre-cmmit autoupdate schedule to monthly

* fix: change the merge strategy for Changelog to Union

* fix: add .envrc to .gitignore

* ci: ignore pre-commit-config and readthedocs for changelog updates

* ci: fix to correct hpc workflow call

* fix: update precommit config

* chore: update pre-commits

* feat: add codeowners file

* chore: update dependencies

* ci: add hpc-config

* docs: changelog

* fix: respond to review comments

---------

Co-authored-by: Jesper Dramsch <[email protected]>
* feat: add configurability to dropout in MultiHeadSelfAttention

Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]>

* test: adjust to dropout_p

* doc: update changelog

* Feature/integrate reusable workflows (#16)

* ci: add public pr label

* ci: add readthedocs update check

* ci: add downstream ci

* ci: add ci-config

* chore(deps): remove unused dependency

* docs: update changelog

* ci: switch to main

* chore: changelog 0.2.1

* Update error messages from invalid sub_graph in model instantiation (#20)

* ci: inherit pypi publish flow (#17)

* ci: inherit pypi publish flow

Co-authored-by: Helen Theissen <[email protected]>

* docs: add to changelog

* fix: typo in reusable workflow

* fix: another typo

* chore: bump actions/setup-python to v5

* ci: run downstream-ci for changes in src and tests

* docs: update changelog

---------

Co-authored-by: Helen Theissen <[email protected]>

* Update CHANGELOG.md to KeepChangelog format

* [pre-commit.ci] pre-commit autoupdate (#25)

updates:
- [github.com/psf/black-pre-commit-mirror: 24.4.2 → 24.8.0](psf/black-pre-commit-mirror@24.4.2...24.8.0)
- [github.com/astral-sh/ruff-pre-commit: v0.4.6 → v0.6.2](astral-sh/ruff-pre-commit@v0.4.6...v0.6.2)
- [github.com/tox-dev/pyproject-fmt: 2.1.3 → 2.2.1](tox-dev/pyproject-fmt@2.1.3...2.2.1)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Ci/changelog-release-updater (#26)

* ci: add changelof release updater

* docs: update changelog

* Feature/integrate reusable workflows (#16)

* ci: add public pr label

* ci: add readthedocs update check

* ci: add downstream ci

* ci: add ci-config

* chore(deps): remove unused dependency

* docs: update changelog

* ci: switch to main

* chore: changelog 0.2.1

* Update error messages from invalid sub_graph in model instantiation (#20)

* ci: inherit pypi publish flow (#17)

* ci: inherit pypi publish flow

Co-authored-by: Helen Theissen <[email protected]>

* docs: add to changelog

* fix: typo in reusable workflow

* fix: another typo

* chore: bump actions/setup-python to v5

* ci: run downstream-ci for changes in src and tests

* docs: update changelog

---------

Co-authored-by: Helen Theissen <[email protected]>

* Update CHANGELOG.md to KeepChangelog format

* Ci/changelog-release-updater (#26)

* ci: add changelof release updater

* docs: update changelog

---------

Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]>
Co-authored-by: Gert Mertes <[email protected]>
Co-authored-by: Mario Santa Cruz <[email protected]>
Co-authored-by: Jesper Dramsch <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
xfail for MultiHeadSelfAttention
* fix: change pre-cmmit autoupdate schedule to monthly

* fix: change the merge strategy for Changelog to Union

* fix: add .envrc to .gitignore

* ci: ignore pre-commit-config and readthedocs for changelog updates

* ci: fix to correct hpc workflow call

* fix: update precommit config

* chore: update pre-commits

* feat: add codeowners file

* chore: update dependencies

* ci: add hpc-config

* docs: changelog

* fix: respond to review comments

---------

Co-authored-by: Jesper Dramsch <[email protected]>
* feat: add configurability to dropout in MultiHeadSelfAttention

Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]>

* test: adjust to dropout_p

* doc: update changelog

* Feature/integrate reusable workflows (#16)

* ci: add public pr label

* ci: add readthedocs update check

* ci: add downstream ci

* ci: add ci-config

* chore(deps): remove unused dependency

* docs: update changelog

* ci: switch to main

* chore: changelog 0.2.1

* Update error messages from invalid sub_graph in model instantiation (#20)

* ci: inherit pypi publish flow (#17)

* ci: inherit pypi publish flow

Co-authored-by: Helen Theissen <[email protected]>

* docs: add to changelog

* fix: typo in reusable workflow

* fix: another typo

* chore: bump actions/setup-python to v5

* ci: run downstream-ci for changes in src and tests

* docs: update changelog

---------

Co-authored-by: Helen Theissen <[email protected]>

* Update CHANGELOG.md to KeepChangelog format

* [pre-commit.ci] pre-commit autoupdate (#25)

updates:
- [github.com/psf/black-pre-commit-mirror: 24.4.2 → 24.8.0](psf/black-pre-commit-mirror@24.4.2...24.8.0)
- [github.com/astral-sh/ruff-pre-commit: v0.4.6 → v0.6.2](astral-sh/ruff-pre-commit@v0.4.6...v0.6.2)
- [github.com/tox-dev/pyproject-fmt: 2.1.3 → 2.2.1](tox-dev/pyproject-fmt@2.1.3...2.2.1)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Ci/changelog-release-updater (#26)

* ci: add changelof release updater

* docs: update changelog

* Feature/integrate reusable workflows (#16)

* ci: add public pr label

* ci: add readthedocs update check

* ci: add downstream ci

* ci: add ci-config

* chore(deps): remove unused dependency

* docs: update changelog

* ci: switch to main

* chore: changelog 0.2.1

* Update error messages from invalid sub_graph in model instantiation (#20)

* ci: inherit pypi publish flow (#17)

* ci: inherit pypi publish flow

Co-authored-by: Helen Theissen <[email protected]>

* docs: add to changelog

* fix: typo in reusable workflow

* fix: another typo

* chore: bump actions/setup-python to v5

* ci: run downstream-ci for changes in src and tests

* docs: update changelog

---------

Co-authored-by: Helen Theissen <[email protected]>

* Update CHANGELOG.md to KeepChangelog format

* Ci/changelog-release-updater (#26)

* ci: add changelof release updater

* docs: update changelog

---------

Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]>
Co-authored-by: Gert Mertes <[email protected]>
Co-authored-by: Mario Santa Cruz <[email protected]>
Co-authored-by: Jesper Dramsch <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
xfail for MultiHeadSelfAttention
Copy link
Collaborator

@anaprietonem anaprietonem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested the PR with flash_attention and scaled_dot_product_attention and it's working fine! previous comments were also addressed.
`

@HCookie HCookie changed the title feat: 44 make flash attention configurable feat: make flash attention configurable Jan 30, 2025
@anaprietonem anaprietonem merged commit 41fcab6 into main Jan 30, 2025
28 checks passed
@HCookie HCookie deleted the feature/44-make-flash-attention-configurable branch January 30, 2025 14:03
Magnus-SI pushed a commit that referenced this pull request Jun 3, 2025
Magnus-SI pushed a commit that referenced this pull request Jun 3, 2025
* feat: refactor GraphCreator
Magnus-SI pushed a commit that referenced this pull request Jun 3, 2025
* Refactor Callbacks
- Split into seperate files
- Use list in config to add callbacks
- Split out plotting callbacks config

* Refactor rollout (#87)
- New rollout central function

---------

Co-authored-by: Mario Santa Cruz <[email protected]>
Co-authored-by: Sara Hahner <[email protected]>
Magnus-SI pushed a commit that referenced this pull request Jun 3, 2025
* feat: FlashMultiHeadSelfAttention

* Chore/multiple fixes ci precommit (#41)

* fix: change pre-cmmit autoupdate schedule to monthly

* fix: change the merge strategy for Changelog to Union

* fix: add .envrc to .gitignore

* ci: ignore pre-commit-config and readthedocs for changelog updates

* ci: fix to correct hpc workflow call

* fix: update precommit config

* chore: update pre-commits

* feat: add codeowners file

* chore: update dependencies

* ci: add hpc-config

* docs: changelog

* fix: respond to review comments

---------

Co-authored-by: Jesper Dramsch <[email protected]>

* 11 add configurability to dropout in multiheadselfattention module (#12)

* feat: add configurability to dropout in MultiHeadSelfAttention

Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]>

* test: adjust to dropout_p

* doc: update changelog

* Feature/integrate reusable workflows (#16)

* ci: add public pr label

* ci: add readthedocs update check

* ci: add downstream ci

* ci: add ci-config

* chore(deps): remove unused dependency

* docs: update changelog

* ci: switch to main

* chore: changelog 0.2.1

* Update error messages from invalid sub_graph in model instantiation (#20)

* ci: inherit pypi publish flow (#17)

* ci: inherit pypi publish flow

Co-authored-by: Helen Theissen <[email protected]>

* docs: add to changelog

* fix: typo in reusable workflow

* fix: another typo

* chore: bump actions/setup-python to v5

* ci: run downstream-ci for changes in src and tests

* docs: update changelog

---------

Co-authored-by: Helen Theissen <[email protected]>

* Update CHANGELOG.md to KeepChangelog format

* [pre-commit.ci] pre-commit autoupdate (#25)

updates:
- [github.com/psf/black-pre-commit-mirror: 24.4.2 → 24.8.0](psf/black-pre-commit-mirror@24.4.2...24.8.0)
- [github.com/astral-sh/ruff-pre-commit: v0.4.6 → v0.6.2](astral-sh/ruff-pre-commit@v0.4.6...v0.6.2)
- [github.com/tox-dev/pyproject-fmt: 2.1.3 → 2.2.1](tox-dev/pyproject-fmt@2.1.3...2.2.1)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Ci/changelog-release-updater (#26)

* ci: add changelof release updater

* docs: update changelog

* Feature/integrate reusable workflows (#16)

* ci: add public pr label

* ci: add readthedocs update check

* ci: add downstream ci

* ci: add ci-config

* chore(deps): remove unused dependency

* docs: update changelog

* ci: switch to main

* chore: changelog 0.2.1

* Update error messages from invalid sub_graph in model instantiation (#20)

* ci: inherit pypi publish flow (#17)

* ci: inherit pypi publish flow

Co-authored-by: Helen Theissen <[email protected]>

* docs: add to changelog

* fix: typo in reusable workflow

* fix: another typo

* chore: bump actions/setup-python to v5

* ci: run downstream-ci for changes in src and tests

* docs: update changelog

---------

Co-authored-by: Helen Theissen <[email protected]>

* Update CHANGELOG.md to KeepChangelog format

* Ci/changelog-release-updater (#26)

* ci: add changelof release updater

* docs: update changelog

---------

Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]>
Co-authored-by: Gert Mertes <[email protected]>
Co-authored-by: Mario Santa Cruz <[email protected]>
Co-authored-by: Jesper Dramsch <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* chore!: drop support for scaled_dot_product_attention

* feat: add softcap

* test: add softcap

xfail for MultiHeadSelfAttention

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* feat: flash attention lazy import

* feat: make alibi slopes configurable

* chore(deps): add flash-attn

* feat: use scaled_dot_product as default

* feat: make alibi_slope cinfigurable in block, chunk processor

* chore(deps): remove flash-attn

* feat: get alibi_slopes

* docs: update docstrings

* fix: bias shape

* fix: softcap optional

* fix: import annotations from future

* fix: annotation error

* docs: update changelog

* fix: type annotation

* feat: catch low flash-attn version

* feat: FlashMultiHeadSelfAttention

* Chore/multiple fixes ci precommit (#41)

* fix: change pre-cmmit autoupdate schedule to monthly

* fix: change the merge strategy for Changelog to Union

* fix: add .envrc to .gitignore

* ci: ignore pre-commit-config and readthedocs for changelog updates

* ci: fix to correct hpc workflow call

* fix: update precommit config

* chore: update pre-commits

* feat: add codeowners file

* chore: update dependencies

* ci: add hpc-config

* docs: changelog

* fix: respond to review comments

---------

Co-authored-by: Jesper Dramsch <[email protected]>

* 11 add configurability to dropout in multiheadselfattention module (#12)

* feat: add configurability to dropout in MultiHeadSelfAttention

Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]>

* test: adjust to dropout_p

* doc: update changelog

* Feature/integrate reusable workflows (#16)

* ci: add public pr label

* ci: add readthedocs update check

* ci: add downstream ci

* ci: add ci-config

* chore(deps): remove unused dependency

* docs: update changelog

* ci: switch to main

* chore: changelog 0.2.1

* Update error messages from invalid sub_graph in model instantiation (#20)

* ci: inherit pypi publish flow (#17)

* ci: inherit pypi publish flow

Co-authored-by: Helen Theissen <[email protected]>

* docs: add to changelog

* fix: typo in reusable workflow

* fix: another typo

* chore: bump actions/setup-python to v5

* ci: run downstream-ci for changes in src and tests

* docs: update changelog

---------

Co-authored-by: Helen Theissen <[email protected]>

* Update CHANGELOG.md to KeepChangelog format

* [pre-commit.ci] pre-commit autoupdate (#25)

updates:
- [github.com/psf/black-pre-commit-mirror: 24.4.2 → 24.8.0](psf/black-pre-commit-mirror@24.4.2...24.8.0)
- [github.com/astral-sh/ruff-pre-commit: v0.4.6 → v0.6.2](astral-sh/ruff-pre-commit@v0.4.6...v0.6.2)
- [github.com/tox-dev/pyproject-fmt: 2.1.3 → 2.2.1](tox-dev/pyproject-fmt@2.1.3...2.2.1)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Ci/changelog-release-updater (#26)

* ci: add changelof release updater

* docs: update changelog

* Feature/integrate reusable workflows (#16)

* ci: add public pr label

* ci: add readthedocs update check

* ci: add downstream ci

* ci: add ci-config

* chore(deps): remove unused dependency

* docs: update changelog

* ci: switch to main

* chore: changelog 0.2.1

* Update error messages from invalid sub_graph in model instantiation (#20)

* ci: inherit pypi publish flow (#17)

* ci: inherit pypi publish flow

Co-authored-by: Helen Theissen <[email protected]>

* docs: add to changelog

* fix: typo in reusable workflow

* fix: another typo

* chore: bump actions/setup-python to v5

* ci: run downstream-ci for changes in src and tests

* docs: update changelog

---------

Co-authored-by: Helen Theissen <[email protected]>

* Update CHANGELOG.md to KeepChangelog format

* Ci/changelog-release-updater (#26)

* ci: add changelof release updater

* docs: update changelog

---------

Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]>
Co-authored-by: Gert Mertes <[email protected]>
Co-authored-by: Mario Santa Cruz <[email protected]>
Co-authored-by: Jesper Dramsch <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* chore!: drop support for scaled_dot_product_attention

* feat: add softcap

* test: add softcap

xfail for MultiHeadSelfAttention

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* feat: flash attention lazy import

* feat: make alibi slopes configurable

* chore(deps): add flash-attn

* feat: use scaled_dot_product as default

* feat: make alibi_slope cinfigurable in block, chunk processor

* chore(deps): remove flash-attn

* feat: get alibi_slopes

* docs: update docstrings

* fix: bias shape

* fix: softcap optional

* fix: import annotations from future

* fix: annotation error

* docs: update changelog

* fix: type annotation

* feat: catch low flash-attn version

* feat: attention wrapper

* fix: remove duplicate version check

* added flex attn wrapper

* fix: alibi_slopes unassigned

* adding causal wip

* added flex attn module

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Bump min torch version to be able to use Flex Attn

* added input parameter checks

* precommit fix

* fix: typo

* test: adjust tests

* fix: no self.use_alibi_slopes

* fix: use_alibi_slope default to false

* feat: Add sliding window support for TorchAttention via mask

* fix: set default flash_attention

* fix: pytest

* fix: tests

* docs: improve docstrings in MultiHeadSelfAttention

* fix: error instead of SystemExit

* chore: refactor SDPAAttention update_mask method

* feat: add missing pytest.ini

* chore: remove explicit float typing

* support running without window size

* test: sepa:rate test for sdpa and flex attention

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added asserts and tests for flex attn

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: embed_dim / num_heads >=16

* test: fix tests to account for embed_dim constraints

* fix tests

* chore: remove debugging code

* consitency change

* chore(configs): add attention_implementation

* Update models/src/anemoi/models/layers/attention.py

Co-authored-by: Harrison Cook <[email protected]>

* Update models/src/anemoi/models/layers/attention.py

Co-authored-by: Harrison Cook <[email protected]>

* fix: address comments

* chore: remove flex_attention

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test: fix merge

* fix test to address breaking change from torch 2.6

* remove flex_attention references

---------

Co-authored-by: Jesper Dramsch <[email protected]>
Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]>
Co-authored-by: Gert Mertes <[email protected]>
Co-authored-by: Mario Santa Cruz <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Cathal OBrien <[email protected]>
Co-authored-by: japols <[email protected]>
Co-authored-by: Harrison Cook <[email protected]>
Co-authored-by: anaprietonem <[email protected]>
matschreiner pushed a commit to matschreiner/anemoi-core that referenced this pull request Jun 4, 2025
matschreiner pushed a commit to matschreiner/anemoi-core that referenced this pull request Jun 4, 2025
* feat: FlashMultiHeadSelfAttention

* Chore/multiple fixes ci precommit (ecmwf#41)

* fix: change pre-cmmit autoupdate schedule to monthly

* fix: change the merge strategy for Changelog to Union

* fix: add .envrc to .gitignore

* ci: ignore pre-commit-config and readthedocs for changelog updates

* ci: fix to correct hpc workflow call

* fix: update precommit config

* chore: update pre-commits

* feat: add codeowners file

* chore: update dependencies

* ci: add hpc-config

* docs: changelog

* fix: respond to review comments

---------

Co-authored-by: Jesper Dramsch <[email protected]>

* 11 add configurability to dropout in multiheadselfattention module (ecmwf#12)

* feat: add configurability to dropout in MultiHeadSelfAttention

Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]>

* test: adjust to dropout_p

* doc: update changelog

* Feature/integrate reusable workflows (ecmwf#16)

* ci: add public pr label

* ci: add readthedocs update check

* ci: add downstream ci

* ci: add ci-config

* chore(deps): remove unused dependency

* docs: update changelog

* ci: switch to main

* chore: changelog 0.2.1

* Update error messages from invalid sub_graph in model instantiation (ecmwf#20)

* ci: inherit pypi publish flow (ecmwf#17)

* ci: inherit pypi publish flow

Co-authored-by: Helen Theissen <[email protected]>

* docs: add to changelog

* fix: typo in reusable workflow

* fix: another typo

* chore: bump actions/setup-python to v5

* ci: run downstream-ci for changes in src and tests

* docs: update changelog

---------

Co-authored-by: Helen Theissen <[email protected]>

* Update CHANGELOG.md to KeepChangelog format

* [pre-commit.ci] pre-commit autoupdate (ecmwf#25)

updates:
- [github.com/psf/black-pre-commit-mirror: 24.4.2 → 24.8.0](psf/black-pre-commit-mirror@24.4.2...24.8.0)
- [github.com/astral-sh/ruff-pre-commit: v0.4.6 → v0.6.2](astral-sh/ruff-pre-commit@v0.4.6...v0.6.2)
- [github.com/tox-dev/pyproject-fmt: 2.1.3 → 2.2.1](tox-dev/pyproject-fmt@2.1.3...2.2.1)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Ci/changelog-release-updater (ecmwf#26)

* ci: add changelof release updater

* docs: update changelog

* Feature/integrate reusable workflows (ecmwf#16)

* ci: add public pr label

* ci: add readthedocs update check

* ci: add downstream ci

* ci: add ci-config

* chore(deps): remove unused dependency

* docs: update changelog

* ci: switch to main

* chore: changelog 0.2.1

* Update error messages from invalid sub_graph in model instantiation (ecmwf#20)

* ci: inherit pypi publish flow (ecmwf#17)

* ci: inherit pypi publish flow

Co-authored-by: Helen Theissen <[email protected]>

* docs: add to changelog

* fix: typo in reusable workflow

* fix: another typo

* chore: bump actions/setup-python to v5

* ci: run downstream-ci for changes in src and tests

* docs: update changelog

---------

Co-authored-by: Helen Theissen <[email protected]>

* Update CHANGELOG.md to KeepChangelog format

* Ci/changelog-release-updater (ecmwf#26)

* ci: add changelof release updater

* docs: update changelog

---------

Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]>
Co-authored-by: Gert Mertes <[email protected]>
Co-authored-by: Mario Santa Cruz <[email protected]>
Co-authored-by: Jesper Dramsch <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* chore!: drop support for scaled_dot_product_attention

* feat: add softcap

* test: add softcap

xfail for MultiHeadSelfAttention

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* feat: flash attention lazy import

* feat: make alibi slopes configurable

* chore(deps): add flash-attn

* feat: use scaled_dot_product as default

* feat: make alibi_slope cinfigurable in block, chunk processor

* chore(deps): remove flash-attn

* feat: get alibi_slopes

* docs: update docstrings

* fix: bias shape

* fix: softcap optional

* fix: import annotations from future

* fix: annotation error

* docs: update changelog

* fix: type annotation

* feat: catch low flash-attn version

* feat: FlashMultiHeadSelfAttention

* Chore/multiple fixes ci precommit (ecmwf#41)

* fix: change pre-cmmit autoupdate schedule to monthly

* fix: change the merge strategy for Changelog to Union

* fix: add .envrc to .gitignore

* ci: ignore pre-commit-config and readthedocs for changelog updates

* ci: fix to correct hpc workflow call

* fix: update precommit config

* chore: update pre-commits

* feat: add codeowners file

* chore: update dependencies

* ci: add hpc-config

* docs: changelog

* fix: respond to review comments

---------

Co-authored-by: Jesper Dramsch <[email protected]>

* 11 add configurability to dropout in multiheadselfattention module (ecmwf#12)

* feat: add configurability to dropout in MultiHeadSelfAttention

Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]>

* test: adjust to dropout_p

* doc: update changelog

* Feature/integrate reusable workflows (ecmwf#16)

* ci: add public pr label

* ci: add readthedocs update check

* ci: add downstream ci

* ci: add ci-config

* chore(deps): remove unused dependency

* docs: update changelog

* ci: switch to main

* chore: changelog 0.2.1

* Update error messages from invalid sub_graph in model instantiation (ecmwf#20)

* ci: inherit pypi publish flow (ecmwf#17)

* ci: inherit pypi publish flow

Co-authored-by: Helen Theissen <[email protected]>

* docs: add to changelog

* fix: typo in reusable workflow

* fix: another typo

* chore: bump actions/setup-python to v5

* ci: run downstream-ci for changes in src and tests

* docs: update changelog

---------

Co-authored-by: Helen Theissen <[email protected]>

* Update CHANGELOG.md to KeepChangelog format

* [pre-commit.ci] pre-commit autoupdate (ecmwf#25)

updates:
- [github.com/psf/black-pre-commit-mirror: 24.4.2 → 24.8.0](psf/black-pre-commit-mirror@24.4.2...24.8.0)
- [github.com/astral-sh/ruff-pre-commit: v0.4.6 → v0.6.2](astral-sh/ruff-pre-commit@v0.4.6...v0.6.2)
- [github.com/tox-dev/pyproject-fmt: 2.1.3 → 2.2.1](tox-dev/pyproject-fmt@2.1.3...2.2.1)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Ci/changelog-release-updater (ecmwf#26)

* ci: add changelof release updater

* docs: update changelog

* Feature/integrate reusable workflows (ecmwf#16)

* ci: add public pr label

* ci: add readthedocs update check

* ci: add downstream ci

* ci: add ci-config

* chore(deps): remove unused dependency

* docs: update changelog

* ci: switch to main

* chore: changelog 0.2.1

* Update error messages from invalid sub_graph in model instantiation (ecmwf#20)

* ci: inherit pypi publish flow (ecmwf#17)

* ci: inherit pypi publish flow

Co-authored-by: Helen Theissen <[email protected]>

* docs: add to changelog

* fix: typo in reusable workflow

* fix: another typo

* chore: bump actions/setup-python to v5

* ci: run downstream-ci for changes in src and tests

* docs: update changelog

---------

Co-authored-by: Helen Theissen <[email protected]>

* Update CHANGELOG.md to KeepChangelog format

* Ci/changelog-release-updater (ecmwf#26)

* ci: add changelof release updater

* docs: update changelog

---------

Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]>
Co-authored-by: Gert Mertes <[email protected]>
Co-authored-by: Mario Santa Cruz <[email protected]>
Co-authored-by: Jesper Dramsch <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* chore!: drop support for scaled_dot_product_attention

* feat: add softcap

* test: add softcap

xfail for MultiHeadSelfAttention

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* feat: flash attention lazy import

* feat: make alibi slopes configurable

* chore(deps): add flash-attn

* feat: use scaled_dot_product as default

* feat: make alibi_slope cinfigurable in block, chunk processor

* chore(deps): remove flash-attn

* feat: get alibi_slopes

* docs: update docstrings

* fix: bias shape

* fix: softcap optional

* fix: import annotations from future

* fix: annotation error

* docs: update changelog

* fix: type annotation

* feat: catch low flash-attn version

* feat: attention wrapper

* fix: remove duplicate version check

* added flex attn wrapper

* fix: alibi_slopes unassigned

* adding causal wip

* added flex attn module

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Bump min torch version to be able to use Flex Attn

* added input parameter checks

* precommit fix

* fix: typo

* test: adjust tests

* fix: no self.use_alibi_slopes

* fix: use_alibi_slope default to false

* feat: Add sliding window support for TorchAttention via mask

* fix: set default flash_attention

* fix: pytest

* fix: tests

* docs: improve docstrings in MultiHeadSelfAttention

* fix: error instead of SystemExit

* chore: refactor SDPAAttention update_mask method

* feat: add missing pytest.ini

* chore: remove explicit float typing

* support running without window size

* test: sepa:rate test for sdpa and flex attention

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added asserts and tests for flex attn

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: embed_dim / num_heads >=16

* test: fix tests to account for embed_dim constraints

* fix tests

* chore: remove debugging code

* consitency change

* chore(configs): add attention_implementation

* Update models/src/anemoi/models/layers/attention.py

Co-authored-by: Harrison Cook <[email protected]>

* Update models/src/anemoi/models/layers/attention.py

Co-authored-by: Harrison Cook <[email protected]>

* fix: address comments

* chore: remove flex_attention

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test: fix merge

* fix test to address breaking change from torch 2.6

* remove flex_attention references

---------

Co-authored-by: Jesper Dramsch <[email protected]>
Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]>
Co-authored-by: Gert Mertes <[email protected]>
Co-authored-by: Mario Santa Cruz <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Cathal OBrien <[email protected]>
Co-authored-by: japols <[email protected]>
Co-authored-by: Harrison Cook <[email protected]>
Co-authored-by: anaprietonem <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants