[chronos-2] add support for SDPA #331

kashif · 2025-10-21T11:56:09Z

This pull request introduces configurable attention backends to the Chronos-2 model, allowing users to select between eager, SDPA, and FlashAttention-2 implementations.

abdulfatir

Thanks @kashif! Left some comments. I think Flash attention won't work with Chronos-2 because masking is very important for the model. SDPA should work though. That said, I actually experimented with SDPA and FlexAttention while training these models. However, in the end I still went with manual attention + torch compile because I ran into weird issues. See:

pytorch/pytorch#149857
pytorch/pytorch#149767

Did you benchmark SDPA vs Eager on your end?

src/chronos/chronos2/layers.py

abdulfatir · 2025-10-21T12:54:23Z

src/chronos/chronos2/pipeline.py

            lr_scheduler_type="linear",
            warmup_ratio=0.0,
            optim="adamw_torch_fused",
-            logging_dir=str(output_dir / "logs"),


Why is this removed?

logging_dir has been removed from the training arguments and the different report_to backends handle the logging within the output dir

Will this work fine for the older transformers versions supported by this package?

yes we can safely not set this argument and it will work for older transformer versions... (as well as newer)

test/test_chronos2.py

abdulfatir · 2025-10-21T20:11:16Z

@kashif with SDPA, the results stay exactly the same and there's a small improvement in the runtime.

abdulfatir · 2025-10-22T06:33:43Z

Ran full eval with FA2. It leads to a very slight worsening of the scores but is clearly faster overall.

abdulfatir · 2025-10-22T06:41:20Z

We have two options:

Keep the FA2 in this PR in hope that someone may benefit from it.

Pros	Cons
Faster inference for specific cases	Non-transparent attn mode
-	Probably won't be tested a lot (dead code)

Only keep SDPA. This goes with the minimalist spirit of the package to reduce future maintenance burden.

@shchur @lostella I can't decide between the two. What do you think?

abdulfatir · 2025-10-22T06:49:52Z

src/chronos/chronos2/layers.py

            assert position_ids is not None, "position_ids must be provided when self.use_rope=True"

+        # Force eager attention if output_attentions is True (only eager returns weights)
+        attn_implementation = self.config._attn_implementation


Does this need to access the private _attn_implementation attribute?

seems for now this is the convention see e.g. https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L277-L278

src/chronos/chronos2/layers.py

test/test_chronos2.py

kashif · 2025-10-22T07:08:21Z

happy to remove fa2 if requested

src/chronos/chronos2/layers.py

lostella · 2025-10-22T07:31:10Z

I am also more inclined for keeping SDPA only and keeping the codebase simpler. We can revisit this were inference speed become a concern

kashif · 2025-10-22T07:33:41Z

ok removing fa2

abdulfatir

Thanks for this great PR @kashif! Let's wait for @shchur's approval before merging.

src/chronos/chronos2/config.py

shchur · 2025-10-22T10:34:06Z

src/chronos/chronos2/config.py

        assert not self.is_gated_act, "gated activation is not supported"

+        # Attention implementation - default to "sdpa" if not specified
+        attn_implementation = attn_implementation or "sdpa"


Could we just set the default value in the function signature to "sdpa", or is there some case where the current logic is required?

indeed we need this as the from_pretrained will get the config.json from the s3, and since it does not have it, the attn_implementation will be None

If the default is set to sdpa as in my suggestion above, this should not be needed right?

src/chronos/chronos2/layers.py

Co-authored-by: Oleksandr Shchur <[email protected]>

src/chronos/chronos2/config.py

abdulfatir · 2025-10-22T11:12:59Z

src/chronos/chronos2/config.py

        assert not self.is_gated_act, "gated activation is not supported"

+        # Attention implementation - default to "sdpa" if not specified
+        attn_implementation = attn_implementation or "sdpa"


If the default is set to sdpa as in my suggestion above, this should not be needed right?

Co-authored-by: Abdul Fatir <[email protected]>

shchur · 2025-10-22T12:48:28Z

Thanks a lot for your contribution @kashif!

kashif added 3 commits October 21, 2025 13:55

add support for spda and flash attention

7bf335a

added tests

1cf40ac

fix imports

5bab24d

abdulfatir reviewed Oct 21, 2025

View reviewed changes

test/test_chronos2.py Outdated Show resolved Hide resolved

added group test

362513a

abdulfatir added the run-eval Run evaluation CI workflow label Oct 21, 2025

flash attention doesnt work with 2d mask

06a1aad

abdulfatir changed the title ~~[chronos-2] add support for spda and flash attention~~ [chronos-2] add support for SDPA and flash attention 2 Oct 22, 2025

abdulfatir reviewed Oct 22, 2025

View reviewed changes

Merge branch 'main' into add-attention-implementations

661679c

shchur reviewed Oct 22, 2025

View reviewed changes

src/chronos/chronos2/layers.py Outdated Show resolved Hide resolved

kashif added 2 commits October 22, 2025 11:51

remove flash attention implementation

8fca666

formatting

5b4a90e

abdulfatir approved these changes Oct 22, 2025

View reviewed changes

abdulfatir changed the title ~~[chronos-2] add support for SDPA and flash attention 2~~ [chronos-2] add support for SDPA Oct 22, 2025

shchur reviewed Oct 22, 2025

View reviewed changes

kashif and others added 3 commits October 22, 2025 12:52

Update src/chronos/chronos2/config.py

23ebe35

Co-authored-by: Oleksandr Shchur <[email protected]>

make sure its eager or sdpa

70e1b10

set default to None

63a8a3c

abdulfatir reviewed Oct 22, 2025

View reviewed changes

kashif and others added 3 commits October 22, 2025 13:41

Update src/chronos/chronos2/config.py

f5540d2

Co-authored-by: Abdul Fatir <[email protected]>

Update src/chronos/chronos2/config.py

011181a

Co-authored-by: Abdul Fatir <[email protected]>

Remove Optional and use | None

ff2515c

abdulfatir merged commit ca9c327 into amazon-science:main Oct 22, 2025
6 checks passed

kashif deleted the add-attention-implementations branch October 22, 2025 12:10

[chronos-2] add support for SDPA #331

[chronos-2] add support for SDPA #331

Conversation

kashif commented Oct 21, 2025

Uh oh!

abdulfatir left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

abdulfatir commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abdulfatir commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abdulfatir commented Oct 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kashif commented Oct 22, 2025

Uh oh!

Uh oh!

lostella commented Oct 22, 2025

Uh oh!

kashif commented Oct 22, 2025

Uh oh!

abdulfatir left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shchur commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

abdulfatir left a comment •

edited

Loading

abdulfatir commented Oct 21, 2025 •

edited

Loading

abdulfatir commented Oct 22, 2025 •

edited

Loading