[FlexAttention] Triton XPU didn't get correct value with the block io if the base address is not restricted aligned #3704

chengjunlu · 2025-03-18T05:13:49Z

Describe the bug

In the FlexDecoding test case, we found an issue that the block IO returns the in-correct matrix value if the base address is not aligned.

The Inductor code will generate the code like this:

    K_block_ptr = tl.make_block_ptr(
        base=K + k_offset,
        shape=(QK_HEAD_DIM, KV_LEN),                # (d, N)
        strides=(stride_kk, stride_kn),
        offsets=(0, off_n),
        block_shape=(QK_HEAD_DIM_ROUNDED, BLOCK_N),
        order=(0, 1)
    )

It adds the offset directly into the base.

K_block_ptr base: 0xff000000002007ca
K_block_ptr shape:  [64]
K_block_ptr shape:  [2048]
K_block_ptr strides:  [1]
K_block_ptr strides:  [64]
K_block_ptr offsets:  [0]
K_block_ptr offsets:  [0]
K_block_ptr block_shape:  [64]
K_block_ptr block_shape:  [64]

Environment details

Triton XPU: Latest

The text was updated successfully, but these errors were encountered:

chengjunlu · 2025-03-18T05:54:28Z

In the 2D block IO lowering, we have compensate the offset of non-64 bytes aligned base to the OffsetX and BaseWidth.
But there is extra restriction on the OffsetX that it has to be 4-bytes aligned.
We need to fallback to gather load for the case that OffsetX is not 4-bytes aligned.

chengjunlu self-assigned this Mar 18, 2025

chengjunlu assigned LiyangLingIntel and chengjunlu and unassigned chengjunlu and LiyangLingIntel Mar 18, 2025

chengjunlu mentioned this issue Mar 18, 2025

[PT 2.8][FlexAttention] Enabling and peformance #3655

Open

vlad-penkin added this to the 1. [PT 2.8 Upstream] TorchInductor milestone Mar 18, 2025

vlad-penkin added tests: torchinductor codegen: attention labels Mar 18, 2025

chengjunlu assigned ESI-SYD and unassigned chengjunlu Mar 19, 2025

chengjunlu linked a pull request Mar 19, 2025 that will close this issue

Check the non 4-bytes aligned base/offsetX/width on block pointer #3712

Draft

This was referenced Mar 19, 2025

[FlexAttention] Accuracy issues during running FlexDecoding UT #3631

Open

[FlexAttention] Accuracy issues during running FlexAttention UT #3632

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FlexAttention] Triton XPU didn't get correct value with the block io if the base address is not restricted aligned #3704

[FlexAttention] Triton XPU didn't get correct value with the block io if the base address is not restricted aligned #3704

chengjunlu commented Mar 18, 2025 •

edited

Loading

chengjunlu commented Mar 18, 2025

[FlexAttention] Triton XPU didn't get correct value with the block io if the base address is not restricted aligned #3704

[FlexAttention] Triton XPU didn't get correct value with the block io if the base address is not restricted aligned #3704

Comments

chengjunlu commented Mar 18, 2025 • edited Loading

Describe the bug

Environment details

chengjunlu commented Mar 18, 2025

chengjunlu commented Mar 18, 2025 •

edited

Loading