Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TEST] Fix test_matmul.py failures #3307

Open
whitneywhtsang opened this issue Jan 30, 2025 · 6 comments · May be fixed by #3682
Open

[TEST] Fix test_matmul.py failures #3307

whitneywhtsang opened this issue Jan 30, 2025 · 6 comments · May be fixed by #3682
Assignees

Comments

@whitneywhtsang
Copy link
Contributor

whitneywhtsang commented Jan 30, 2025

Merged in #3302.

Fix test_mxfp, test_blocked_scale_mxfp, test_lhs_in_tmem_mxfp, and test_block_scale_fp4.

@whitneywhtsang whitneywhtsang changed the title [TEST] Fix test_matmul.py::test_mxfp [TEST] Fix test_matmul.py::test_mxfp and test_blocked_scale_mxfp Jan 30, 2025
@whitneywhtsang whitneywhtsang changed the title [TEST] Fix test_matmul.py::test_mxfp and test_blocked_scale_mxfp [TEST] Fix test_matmul.py::test_mxfp and test_matmul.py::test_blocked_scale_mxfp Jan 30, 2025
@whitneywhtsang whitneywhtsang changed the title [TEST] Fix test_matmul.py::test_mxfp and test_matmul.py::test_blocked_scale_mxfp [TEST] Fix test_matmul.py failures Jan 30, 2025
@whitneywhtsang
Copy link
Contributor Author

Please also fix test_mxfp8_mxfp4_matmul which is added in #3368.

whitneywhtsang added a commit that referenced this issue Feb 6, 2025
This PR change the Triton base from
716a521 to
ac9574c (Feb 4).
Pass rate: 98.19% -> 97.97% (#3307)

Please do not squash and merge this PR.
@AndreyPavlenko
Copy link
Contributor

The tests fail with:

error: failed to legalize operation 'tt.dot_scaled' that was explicitly marked illegal

The root cause - both LHS and RHS scale is not supported here and here. The implementation also depends on UpcastMXFPOp that does not support RHS scale and has also been removed upstream. To proceed with a fix, the upstream changes should be applied first.

@LiyangLingIntel
Copy link
Contributor

The both RHS and LHS scaling will be supported by #3607.
Have tested locally, the most of the remained failures of test_mxfp and test_blocked_scale_mxfp are Out Of Share Memory , we may need to change the UT kernel block size and skip some nv/amd specific checks.
@AndreyPavlenko Maybe you can proceed this fix when #3607 gets merged.

@AndreyPavlenko
Copy link
Contributor

AndreyPavlenko commented Mar 14, 2025

@LiyangLingIntel There are also element mismatch failures. Could it be caused by the new implementation?
Also the test test_block_scale_fp4 never completes.

@LiyangLingIntel
Copy link
Contributor

@LiyangLingIntel There are also element mismatch failures. Could it be caused by the new implementation? Also the test test_block_scale_fp4 never completes.

I'm not quite sure about the root cause. Basically the new implementation has passed scaled_dot test cases, we need to take a look into these mismatch cases.

@anmyachev
Copy link
Contributor

More tests to support (test_mxfp8_mxfp4_matmul): #3699 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants