feat: make flash attention configurable #60

theissenhelen · 2025-01-06T16:27:11Z

Current setup:

If flash-attn is available in the environment, the MultiHeadSelfAttention module automatically imports the corresponding attention function. In inference however we do not have that information.

Now:

user specifies whether flash-attn, flex attention or scaled dot product attention should be used in the model config.
adds configurable parameters (soft cap, aLiBi) for flash attention
for aLiB:i adds a function to compute the slopes according to the number of attention heads
scaled dot product attention now supports sliding window (making it numerically equivalent to flash/flex)

Todo:

test various attention options
adjust test case coverage

* fix: change pre-cmmit autoupdate schedule to monthly * fix: change the merge strategy for Changelog to Union * fix: add .envrc to .gitignore * ci: ignore pre-commit-config and readthedocs for changelog updates * ci: fix to correct hpc workflow call * fix: update precommit config * chore: update pre-commits * feat: add codeowners file * chore: update dependencies * ci: add hpc-config * docs: changelog * fix: respond to review comments --------- Co-authored-by: Jesper Dramsch <[email protected]>

* feat: add configurability to dropout in MultiHeadSelfAttention Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]> * test: adjust to dropout_p * doc: update changelog * Feature/integrate reusable workflows (#16) * ci: add public pr label * ci: add readthedocs update check * ci: add downstream ci * ci: add ci-config * chore(deps): remove unused dependency * docs: update changelog * ci: switch to main * chore: changelog 0.2.1 * Update error messages from invalid sub_graph in model instantiation (#20) * ci: inherit pypi publish flow (#17) * ci: inherit pypi publish flow Co-authored-by: Helen Theissen <[email protected]> * docs: add to changelog * fix: typo in reusable workflow * fix: another typo * chore: bump actions/setup-python to v5 * ci: run downstream-ci for changes in src and tests * docs: update changelog --------- Co-authored-by: Helen Theissen <[email protected]> * Update CHANGELOG.md to KeepChangelog format * [pre-commit.ci] pre-commit autoupdate (#25) updates: - [github.com/psf/black-pre-commit-mirror: 24.4.2 → 24.8.0](psf/black-pre-commit-mirror@24.4.2...24.8.0) - [github.com/astral-sh/ruff-pre-commit: v0.4.6 → v0.6.2](astral-sh/ruff-pre-commit@v0.4.6...v0.6.2) - [github.com/tox-dev/pyproject-fmt: 2.1.3 → 2.2.1](tox-dev/pyproject-fmt@2.1.3...2.2.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Ci/changelog-release-updater (#26) * ci: add changelof release updater * docs: update changelog * Feature/integrate reusable workflows (#16) * ci: add public pr label * ci: add readthedocs update check * ci: add downstream ci * ci: add ci-config * chore(deps): remove unused dependency * docs: update changelog * ci: switch to main * chore: changelog 0.2.1 * Update error messages from invalid sub_graph in model instantiation (#20) * ci: inherit pypi publish flow (#17) * ci: inherit pypi publish flow Co-authored-by: Helen Theissen <[email protected]> * docs: add to changelog * fix: typo in reusable workflow * fix: another typo * chore: bump actions/setup-python to v5 * ci: run downstream-ci for changes in src and tests * docs: update changelog --------- Co-authored-by: Helen Theissen <[email protected]> * Update CHANGELOG.md to KeepChangelog format * Ci/changelog-release-updater (#26) * ci: add changelof release updater * docs: update changelog --------- Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]> Co-authored-by: Gert Mertes <[email protected]> Co-authored-by: Mario Santa Cruz <[email protected]> Co-authored-by: Jesper Dramsch <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

xfail for MultiHeadSelfAttention

for more information, see https://pre-commit.ci

* fix: change pre-cmmit autoupdate schedule to monthly * fix: change the merge strategy for Changelog to Union * fix: add .envrc to .gitignore * ci: ignore pre-commit-config and readthedocs for changelog updates * ci: fix to correct hpc workflow call * fix: update precommit config * chore: update pre-commits * feat: add codeowners file * chore: update dependencies * ci: add hpc-config * docs: changelog * fix: respond to review comments --------- Co-authored-by: Jesper Dramsch <[email protected]>

* feat: add configurability to dropout in MultiHeadSelfAttention Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]> * test: adjust to dropout_p * doc: update changelog * Feature/integrate reusable workflows (#16) * ci: add public pr label * ci: add readthedocs update check * ci: add downstream ci * ci: add ci-config * chore(deps): remove unused dependency * docs: update changelog * ci: switch to main * chore: changelog 0.2.1 * Update error messages from invalid sub_graph in model instantiation (#20) * ci: inherit pypi publish flow (#17) * ci: inherit pypi publish flow Co-authored-by: Helen Theissen <[email protected]> * docs: add to changelog * fix: typo in reusable workflow * fix: another typo * chore: bump actions/setup-python to v5 * ci: run downstream-ci for changes in src and tests * docs: update changelog --------- Co-authored-by: Helen Theissen <[email protected]> * Update CHANGELOG.md to KeepChangelog format * [pre-commit.ci] pre-commit autoupdate (#25) updates: - [github.com/psf/black-pre-commit-mirror: 24.4.2 → 24.8.0](psf/black-pre-commit-mirror@24.4.2...24.8.0) - [github.com/astral-sh/ruff-pre-commit: v0.4.6 → v0.6.2](astral-sh/ruff-pre-commit@v0.4.6...v0.6.2) - [github.com/tox-dev/pyproject-fmt: 2.1.3 → 2.2.1](tox-dev/pyproject-fmt@2.1.3...2.2.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Ci/changelog-release-updater (#26) * ci: add changelof release updater * docs: update changelog * Feature/integrate reusable workflows (#16) * ci: add public pr label * ci: add readthedocs update check * ci: add downstream ci * ci: add ci-config * chore(deps): remove unused dependency * docs: update changelog * ci: switch to main * chore: changelog 0.2.1 * Update error messages from invalid sub_graph in model instantiation (#20) * ci: inherit pypi publish flow (#17) * ci: inherit pypi publish flow Co-authored-by: Helen Theissen <[email protected]> * docs: add to changelog * fix: typo in reusable workflow * fix: another typo * chore: bump actions/setup-python to v5 * ci: run downstream-ci for changes in src and tests * docs: update changelog --------- Co-authored-by: Helen Theissen <[email protected]> * Update CHANGELOG.md to KeepChangelog format * Ci/changelog-release-updater (#26) * ci: add changelof release updater * docs: update changelog --------- Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]> Co-authored-by: Gert Mertes <[email protected]> Co-authored-by: Mario Santa Cruz <[email protected]> Co-authored-by: Jesper Dramsch <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

xfail for MultiHeadSelfAttention

for more information, see https://pre-commit.ci

models/pytest.ini

models/src/anemoi/models/layers/attention.py

Co-authored-by: Harrison Cook <[email protected]>

for more information, see https://pre-commit.ci

anaprietonem

I have tested the PR with flash_attention and scaled_dot_product_attention and it's working fine! previous comments were also addressed.
`

Add workflow to sync repos

* feat: refactor GraphCreator

* Refactor Callbacks - Split into seperate files - Use list in config to add callbacks - Split out plotting callbacks config * Refactor rollout (#87) - New rollout central function --------- Co-authored-by: Mario Santa Cruz <[email protected]> Co-authored-by: Sara Hahner <[email protected]>

* feat: FlashMultiHeadSelfAttention * Chore/multiple fixes ci precommit (#41) * fix: change pre-cmmit autoupdate schedule to monthly * fix: change the merge strategy for Changelog to Union * fix: add .envrc to .gitignore * ci: ignore pre-commit-config and readthedocs for changelog updates * ci: fix to correct hpc workflow call * fix: update precommit config * chore: update pre-commits * feat: add codeowners file * chore: update dependencies * ci: add hpc-config * docs: changelog * fix: respond to review comments --------- Co-authored-by: Jesper Dramsch <[email protected]> * 11 add configurability to dropout in multiheadselfattention module (#12) * feat: add configurability to dropout in MultiHeadSelfAttention Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]> * test: adjust to dropout_p * doc: update changelog * Feature/integrate reusable workflows (#16) * ci: add public pr label * ci: add readthedocs update check * ci: add downstream ci * ci: add ci-config * chore(deps): remove unused dependency * docs: update changelog * ci: switch to main * chore: changelog 0.2.1 * Update error messages from invalid sub_graph in model instantiation (#20) * ci: inherit pypi publish flow (#17) * ci: inherit pypi publish flow Co-authored-by: Helen Theissen <[email protected]> * docs: add to changelog * fix: typo in reusable workflow * fix: another typo * chore: bump actions/setup-python to v5 * ci: run downstream-ci for changes in src and tests * docs: update changelog --------- Co-authored-by: Helen Theissen <[email protected]> * Update CHANGELOG.md to KeepChangelog format * [pre-commit.ci] pre-commit autoupdate (#25) updates: - [github.com/psf/black-pre-commit-mirror: 24.4.2 → 24.8.0](psf/black-pre-commit-mirror@24.4.2...24.8.0) - [github.com/astral-sh/ruff-pre-commit: v0.4.6 → v0.6.2](astral-sh/ruff-pre-commit@v0.4.6...v0.6.2) - [github.com/tox-dev/pyproject-fmt: 2.1.3 → 2.2.1](tox-dev/pyproject-fmt@2.1.3...2.2.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Ci/changelog-release-updater (#26) * ci: add changelof release updater * docs: update changelog * Feature/integrate reusable workflows (#16) * ci: add public pr label * ci: add readthedocs update check * ci: add downstream ci * ci: add ci-config * chore(deps): remove unused dependency * docs: update changelog * ci: switch to main * chore: changelog 0.2.1 * Update error messages from invalid sub_graph in model instantiation (#20) * ci: inherit pypi publish flow (#17) * ci: inherit pypi publish flow Co-authored-by: Helen Theissen <[email protected]> * docs: add to changelog * fix: typo in reusable workflow * fix: another typo * chore: bump actions/setup-python to v5 * ci: run downstream-ci for changes in src and tests * docs: update changelog --------- Co-authored-by: Helen Theissen <[email protected]> * Update CHANGELOG.md to KeepChangelog format * Ci/changelog-release-updater (#26) * ci: add changelof release updater * docs: update changelog --------- Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]> Co-authored-by: Gert Mertes <[email protected]> Co-authored-by: Mario Santa Cruz <[email protected]> Co-authored-by: Jesper Dramsch <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * chore!: drop support for scaled_dot_product_attention * feat: add softcap * test: add softcap xfail for MultiHeadSelfAttention * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat: flash attention lazy import * feat: make alibi slopes configurable * chore(deps): add flash-attn * feat: use scaled_dot_product as default * feat: make alibi_slope cinfigurable in block, chunk processor * chore(deps): remove flash-attn * feat: get alibi_slopes * docs: update docstrings * fix: bias shape * fix: softcap optional * fix: import annotations from future * fix: annotation error * docs: update changelog * fix: type annotation * feat: catch low flash-attn version * feat: FlashMultiHeadSelfAttention * Chore/multiple fixes ci precommit (#41) * fix: change pre-cmmit autoupdate schedule to monthly * fix: change the merge strategy for Changelog to Union * fix: add .envrc to .gitignore * ci: ignore pre-commit-config and readthedocs for changelog updates * ci: fix to correct hpc workflow call * fix: update precommit config * chore: update pre-commits * feat: add codeowners file * chore: update dependencies * ci: add hpc-config * docs: changelog * fix: respond to review comments --------- Co-authored-by: Jesper Dramsch <[email protected]> * 11 add configurability to dropout in multiheadselfattention module (#12) * feat: add configurability to dropout in MultiHeadSelfAttention Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]> * test: adjust to dropout_p * doc: update changelog * Feature/integrate reusable workflows (#16) * ci: add public pr label * ci: add readthedocs update check * ci: add downstream ci * ci: add ci-config * chore(deps): remove unused dependency * docs: update changelog * ci: switch to main * chore: changelog 0.2.1 * Update error messages from invalid sub_graph in model instantiation (#20) * ci: inherit pypi publish flow (#17) * ci: inherit pypi publish flow Co-authored-by: Helen Theissen <[email protected]> * docs: add to changelog * fix: typo in reusable workflow * fix: another typo * chore: bump actions/setup-python to v5 * ci: run downstream-ci for changes in src and tests * docs: update changelog --------- Co-authored-by: Helen Theissen <[email protected]> * Update CHANGELOG.md to KeepChangelog format * [pre-commit.ci] pre-commit autoupdate (#25) updates: - [github.com/psf/black-pre-commit-mirror: 24.4.2 → 24.8.0](psf/black-pre-commit-mirror@24.4.2...24.8.0) - [github.com/astral-sh/ruff-pre-commit: v0.4.6 → v0.6.2](astral-sh/ruff-pre-commit@v0.4.6...v0.6.2) - [github.com/tox-dev/pyproject-fmt: 2.1.3 → 2.2.1](tox-dev/pyproject-fmt@2.1.3...2.2.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Ci/changelog-release-updater (#26) * ci: add changelof release updater * docs: update changelog * Feature/integrate reusable workflows (#16) * ci: add public pr label * ci: add readthedocs update check * ci: add downstream ci * ci: add ci-config * chore(deps): remove unused dependency * docs: update changelog * ci: switch to main * chore: changelog 0.2.1 * Update error messages from invalid sub_graph in model instantiation (#20) * ci: inherit pypi publish flow (#17) * ci: inherit pypi publish flow Co-authored-by: Helen Theissen <[email protected]> * docs: add to changelog * fix: typo in reusable workflow * fix: another typo * chore: bump actions/setup-python to v5 * ci: run downstream-ci for changes in src and tests * docs: update changelog --------- Co-authored-by: Helen Theissen <[email protected]> * Update CHANGELOG.md to KeepChangelog format * Ci/changelog-release-updater (#26) * ci: add changelof release updater * docs: update changelog --------- Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]> Co-authored-by: Gert Mertes <[email protected]> Co-authored-by: Mario Santa Cruz <[email protected]> Co-authored-by: Jesper Dramsch <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * chore!: drop support for scaled_dot_product_attention * feat: add softcap * test: add softcap xfail for MultiHeadSelfAttention * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat: flash attention lazy import * feat: make alibi slopes configurable * chore(deps): add flash-attn * feat: use scaled_dot_product as default * feat: make alibi_slope cinfigurable in block, chunk processor * chore(deps): remove flash-attn * feat: get alibi_slopes * docs: update docstrings * fix: bias shape * fix: softcap optional * fix: import annotations from future * fix: annotation error * docs: update changelog * fix: type annotation * feat: catch low flash-attn version * feat: attention wrapper * fix: remove duplicate version check * added flex attn wrapper * fix: alibi_slopes unassigned * adding causal wip * added flex attn module * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Bump min torch version to be able to use Flex Attn * added input parameter checks * precommit fix * fix: typo * test: adjust tests * fix: no self.use_alibi_slopes * fix: use_alibi_slope default to false * feat: Add sliding window support for TorchAttention via mask * fix: set default flash_attention * fix: pytest * fix: tests * docs: improve docstrings in MultiHeadSelfAttention * fix: error instead of SystemExit * chore: refactor SDPAAttention update_mask method * feat: add missing pytest.ini * chore: remove explicit float typing * support running without window size * test: sepa:rate test for sdpa and flex attention * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added asserts and tests for flex attn * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: embed_dim / num_heads >=16 * test: fix tests to account for embed_dim constraints * fix tests * chore: remove debugging code * consitency change * chore(configs): add attention_implementation * Update models/src/anemoi/models/layers/attention.py Co-authored-by: Harrison Cook <[email protected]> * Update models/src/anemoi/models/layers/attention.py Co-authored-by: Harrison Cook <[email protected]> * fix: address comments * chore: remove flex_attention * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test: fix merge * fix test to address breaking change from torch 2.6 * remove flex_attention references --------- Co-authored-by: Jesper Dramsch <[email protected]> Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]> Co-authored-by: Gert Mertes <[email protected]> Co-authored-by: Mario Santa Cruz <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cathal OBrien <[email protected]> Co-authored-by: japols <[email protected]> Co-authored-by: Harrison Cook <[email protected]> Co-authored-by: anaprietonem <[email protected]>

Add workflow to sync repos

* feat: FlashMultiHeadSelfAttention * Chore/multiple fixes ci precommit (ecmwf#41) * fix: change pre-cmmit autoupdate schedule to monthly * fix: change the merge strategy for Changelog to Union * fix: add .envrc to .gitignore * ci: ignore pre-commit-config and readthedocs for changelog updates * ci: fix to correct hpc workflow call * fix: update precommit config * chore: update pre-commits * feat: add codeowners file * chore: update dependencies * ci: add hpc-config * docs: changelog * fix: respond to review comments --------- Co-authored-by: Jesper Dramsch <[email protected]> * 11 add configurability to dropout in multiheadselfattention module (ecmwf#12) * feat: add configurability to dropout in MultiHeadSelfAttention Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]> * test: adjust to dropout_p * doc: update changelog * Feature/integrate reusable workflows (ecmwf#16) * ci: add public pr label * ci: add readthedocs update check * ci: add downstream ci * ci: add ci-config * chore(deps): remove unused dependency * docs: update changelog * ci: switch to main * chore: changelog 0.2.1 * Update error messages from invalid sub_graph in model instantiation (ecmwf#20) * ci: inherit pypi publish flow (ecmwf#17) * ci: inherit pypi publish flow Co-authored-by: Helen Theissen <[email protected]> * docs: add to changelog * fix: typo in reusable workflow * fix: another typo * chore: bump actions/setup-python to v5 * ci: run downstream-ci for changes in src and tests * docs: update changelog --------- Co-authored-by: Helen Theissen <[email protected]> * Update CHANGELOG.md to KeepChangelog format * [pre-commit.ci] pre-commit autoupdate (ecmwf#25) updates: - [github.com/psf/black-pre-commit-mirror: 24.4.2 → 24.8.0](psf/black-pre-commit-mirror@24.4.2...24.8.0) - [github.com/astral-sh/ruff-pre-commit: v0.4.6 → v0.6.2](astral-sh/ruff-pre-commit@v0.4.6...v0.6.2) - [github.com/tox-dev/pyproject-fmt: 2.1.3 → 2.2.1](tox-dev/pyproject-fmt@2.1.3...2.2.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Ci/changelog-release-updater (ecmwf#26) * ci: add changelof release updater * docs: update changelog * Feature/integrate reusable workflows (ecmwf#16) * ci: add public pr label * ci: add readthedocs update check * ci: add downstream ci * ci: add ci-config * chore(deps): remove unused dependency * docs: update changelog * ci: switch to main * chore: changelog 0.2.1 * Update error messages from invalid sub_graph in model instantiation (ecmwf#20) * ci: inherit pypi publish flow (ecmwf#17) * ci: inherit pypi publish flow Co-authored-by: Helen Theissen <[email protected]> * docs: add to changelog * fix: typo in reusable workflow * fix: another typo * chore: bump actions/setup-python to v5 * ci: run downstream-ci for changes in src and tests * docs: update changelog --------- Co-authored-by: Helen Theissen <[email protected]> * Update CHANGELOG.md to KeepChangelog format * Ci/changelog-release-updater (ecmwf#26) * ci: add changelof release updater * docs: update changelog --------- Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]> Co-authored-by: Gert Mertes <[email protected]> Co-authored-by: Mario Santa Cruz <[email protected]> Co-authored-by: Jesper Dramsch <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * chore!: drop support for scaled_dot_product_attention * feat: add softcap * test: add softcap xfail for MultiHeadSelfAttention * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat: flash attention lazy import * feat: make alibi slopes configurable * chore(deps): add flash-attn * feat: use scaled_dot_product as default * feat: make alibi_slope cinfigurable in block, chunk processor * chore(deps): remove flash-attn * feat: get alibi_slopes * docs: update docstrings * fix: bias shape * fix: softcap optional * fix: import annotations from future * fix: annotation error * docs: update changelog * fix: type annotation * feat: catch low flash-attn version * feat: FlashMultiHeadSelfAttention * Chore/multiple fixes ci precommit (ecmwf#41) * fix: change pre-cmmit autoupdate schedule to monthly * fix: change the merge strategy for Changelog to Union * fix: add .envrc to .gitignore * ci: ignore pre-commit-config and readthedocs for changelog updates * ci: fix to correct hpc workflow call * fix: update precommit config * chore: update pre-commits * feat: add codeowners file * chore: update dependencies * ci: add hpc-config * docs: changelog * fix: respond to review comments --------- Co-authored-by: Jesper Dramsch <[email protected]> * 11 add configurability to dropout in multiheadselfattention module (ecmwf#12) * feat: add configurability to dropout in MultiHeadSelfAttention Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]> * test: adjust to dropout_p * doc: update changelog * Feature/integrate reusable workflows (ecmwf#16) * ci: add public pr label * ci: add readthedocs update check * ci: add downstream ci * ci: add ci-config * chore(deps): remove unused dependency * docs: update changelog * ci: switch to main * chore: changelog 0.2.1 * Update error messages from invalid sub_graph in model instantiation (ecmwf#20) * ci: inherit pypi publish flow (ecmwf#17) * ci: inherit pypi publish flow Co-authored-by: Helen Theissen <[email protected]> * docs: add to changelog * fix: typo in reusable workflow * fix: another typo * chore: bump actions/setup-python to v5 * ci: run downstream-ci for changes in src and tests * docs: update changelog --------- Co-authored-by: Helen Theissen <[email protected]> * Update CHANGELOG.md to KeepChangelog format * [pre-commit.ci] pre-commit autoupdate (ecmwf#25) updates: - [github.com/psf/black-pre-commit-mirror: 24.4.2 → 24.8.0](psf/black-pre-commit-mirror@24.4.2...24.8.0) - [github.com/astral-sh/ruff-pre-commit: v0.4.6 → v0.6.2](astral-sh/ruff-pre-commit@v0.4.6...v0.6.2) - [github.com/tox-dev/pyproject-fmt: 2.1.3 → 2.2.1](tox-dev/pyproject-fmt@2.1.3...2.2.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Ci/changelog-release-updater (ecmwf#26) * ci: add changelof release updater * docs: update changelog * Feature/integrate reusable workflows (ecmwf#16) * ci: add public pr label * ci: add readthedocs update check * ci: add downstream ci * ci: add ci-config * chore(deps): remove unused dependency * docs: update changelog * ci: switch to main * chore: changelog 0.2.1 * Update error messages from invalid sub_graph in model instantiation (ecmwf#20) * ci: inherit pypi publish flow (ecmwf#17) * ci: inherit pypi publish flow Co-authored-by: Helen Theissen <[email protected]> * docs: add to changelog * fix: typo in reusable workflow * fix: another typo * chore: bump actions/setup-python to v5 * ci: run downstream-ci for changes in src and tests * docs: update changelog --------- Co-authored-by: Helen Theissen <[email protected]> * Update CHANGELOG.md to KeepChangelog format * Ci/changelog-release-updater (ecmwf#26) * ci: add changelof release updater * docs: update changelog --------- Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]> Co-authored-by: Gert Mertes <[email protected]> Co-authored-by: Mario Santa Cruz <[email protected]> Co-authored-by: Jesper Dramsch <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * chore!: drop support for scaled_dot_product_attention * feat: add softcap * test: add softcap xfail for MultiHeadSelfAttention * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat: flash attention lazy import * feat: make alibi slopes configurable * chore(deps): add flash-attn * feat: use scaled_dot_product as default * feat: make alibi_slope cinfigurable in block, chunk processor * chore(deps): remove flash-attn * feat: get alibi_slopes * docs: update docstrings * fix: bias shape * fix: softcap optional * fix: import annotations from future * fix: annotation error * docs: update changelog * fix: type annotation * feat: catch low flash-attn version * feat: attention wrapper * fix: remove duplicate version check * added flex attn wrapper * fix: alibi_slopes unassigned * adding causal wip * added flex attn module * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Bump min torch version to be able to use Flex Attn * added input parameter checks * precommit fix * fix: typo * test: adjust tests * fix: no self.use_alibi_slopes * fix: use_alibi_slope default to false * feat: Add sliding window support for TorchAttention via mask * fix: set default flash_attention * fix: pytest * fix: tests * docs: improve docstrings in MultiHeadSelfAttention * fix: error instead of SystemExit * chore: refactor SDPAAttention update_mask method * feat: add missing pytest.ini * chore: remove explicit float typing * support running without window size * test: sepa:rate test for sdpa and flex attention * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added asserts and tests for flex attn * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: embed_dim / num_heads >=16 * test: fix tests to account for embed_dim constraints * fix tests * chore: remove debugging code * consitency change * chore(configs): add attention_implementation * Update models/src/anemoi/models/layers/attention.py Co-authored-by: Harrison Cook <[email protected]> * Update models/src/anemoi/models/layers/attention.py Co-authored-by: Harrison Cook <[email protected]> * fix: address comments * chore: remove flex_attention * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test: fix merge * fix test to address breaking change from torch 2.6 * remove flex_attention references --------- Co-authored-by: Jesper Dramsch <[email protected]> Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]> Co-authored-by: Gert Mertes <[email protected]> Co-authored-by: Mario Santa Cruz <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cathal OBrien <[email protected]> Co-authored-by: japols <[email protected]> Co-authored-by: Harrison Cook <[email protected]> Co-authored-by: anaprietonem <[email protected]>

theissenhelen and others added 30 commits September 27, 2024 08:42

feat: FlashMultiHeadSelfAttention

539e8a2

chore!: drop support for scaled_dot_product_attention

a86c9a8

feat: add softcap

105443f

test: add softcap

e82a59e

xfail for MultiHeadSelfAttention

[pre-commit.ci] auto fixes from pre-commit.com hooks

e648eb0

for more information, see https://pre-commit.ci

feat: flash attention lazy import

6271cd8

feat: make alibi slopes configurable

d4940e7

chore(deps): add flash-attn

9ff6cb9

feat: use scaled_dot_product as default

bbd89dc

feat: make alibi_slope cinfigurable in block, chunk processor

91533c6

chore(deps): remove flash-attn

0eb5c50

feat: get alibi_slopes

c04e641

docs: update docstrings

6523b47

fix: bias shape

22623cc

fix: softcap optional

ed07e34

fix: import annotations from future

c841324

fix: annotation error

6c12dda

docs: update changelog

b7b8f2e

fix: type annotation

df353d9

feat: catch low flash-attn version

fc335c7

feat: FlashMultiHeadSelfAttention

663fea0

chore!: drop support for scaled_dot_product_attention

0c55a9c

feat: add softcap

ea665be

test: add softcap

ffa2d99

xfail for MultiHeadSelfAttention

[pre-commit.ci] auto fixes from pre-commit.com hooks

7c2d634

for more information, see https://pre-commit.ci

feat: flash attention lazy import

d2ed932

anaprietonem reviewed Jan 27, 2025

View reviewed changes

models/pytest.ini Show resolved Hide resolved

anaprietonem reviewed Jan 27, 2025

View reviewed changes

models/src/anemoi/models/layers/attention.py Outdated Show resolved Hide resolved

anaprietonem reviewed Jan 27, 2025

View reviewed changes

models/src/anemoi/models/layers/attention.py Outdated Show resolved Hide resolved

anaprietonem reviewed Jan 27, 2025

View reviewed changes

models/src/anemoi/models/layers/attention.py Show resolved Hide resolved

theissenhelen and others added 8 commits January 29, 2025 11:52

Update models/src/anemoi/models/layers/attention.py

603ab17

Co-authored-by: Harrison Cook <[email protected]>

Update models/src/anemoi/models/layers/attention.py

c2925fa

Co-authored-by: Harrison Cook <[email protected]>

fix: address comments

88dc6d5

chore: remove flex_attention

1d65779

Merge branch 'main' into feature/44-make-flash-attention-configurable

46ba31c

[pre-commit.ci] auto fixes from pre-commit.com hooks

e5f0f49

for more information, see https://pre-commit.ci

test: fix merge

8e7a93c

fix test to address breaking change from torch 2.6

8ef5575

github-actions bot added the graphs label Jan 30, 2025

remove flex_attention references

d20d3ed

anaprietonem approved these changes Jan 30, 2025

View reviewed changes

HCookie changed the title ~~feat: 44 make flash attention configurable~~ feat: make flash attention configurable Jan 30, 2025

HCookie approved these changes Jan 30, 2025

View reviewed changes

anaprietonem merged commit 41fcab6 into main Jan 30, 2025
28 checks passed

HCookie deleted the feature/44-make-flash-attention-configurable branch January 30, 2025 14:03

Magnus-SI pushed a commit that referenced this pull request Jun 3, 2025

Merge pull request #60 from ecmwf/feature/synch-repos

8c0b966

Add workflow to sync repos

Magnus-SI pushed a commit that referenced this pull request Jun 3, 2025

Refactor GraphCreator (#60)

55beef0

* feat: refactor GraphCreator

matschreiner pushed a commit to matschreiner/anemoi-core that referenced this pull request Jun 4, 2025

Merge pull request ecmwf#60 from ecmwf/feature/synch-repos

e83bbca

Add workflow to sync repos

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: make flash attention configurable #60

feat: make flash attention configurable #60

Uh oh!

theissenhelen commented Jan 6, 2025 •

edited by anaprietonem

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anaprietonem left a comment

Uh oh!

Uh oh!

Uh oh!

feat: make flash attention configurable #60

feat: make flash attention configurable #60

Uh oh!

Conversation

theissenhelen commented Jan 6, 2025 • edited by anaprietonem Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anaprietonem left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

theissenhelen commented Jan 6, 2025 •

edited by anaprietonem

Loading