Skip to content

Implement vec256/512 #3966

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Jun 13, 2025
Merged

Implement vec256/512 #3966

merged 30 commits into from
Jun 13, 2025

Conversation

TheNumbat
Copy link
Member

@TheNumbat TheNumbat commented May 5, 2025

Adds support for 256 and 512 bit vectors throughout the compiler. Generally has exactly the same semantics as 128 bit vectors, but larger. About half of the diff (+5k loc) is from enabling AVX/AVX2 in simdgen.

  • 256-bit types are allowed with simd_beta. The basic tests for 128-bit have been duplicated for 256, but none of the operations in the other tests are implemented yet. AVX and AVX2 architecture extensions have been added and are enabled by default, since we already enable other Haswell extensions.
  • 512-bit types are allowed with simd_alpha. They do not yet have tests and their usage will trigger a fatal error in emit. The AVX512F extension is also disabled by default.

I think the only changes that need careful review are those related to preserving/aligning wide registers (mainly amd64.S/emit):

  • caml_call_gc and related functions now have versions that save ymm/zmm registers. Emit chooses which one to call based on the live register set. Programs that do not use 256/512 bit vectors should see identical behavior.
  • When a C call includes vec256 or vec512 stack arguments, C compilers assume the stack will be 32/64 byte aligned upon entering the external call. Therefore we emit a call to caml_c_call_stack_args_avx{512}, which aligns the C stack before copying the arguments from the OCaml stack. We also do this in runtime4.

For later PRs:

  • constants (+tests)
  • static & reinterpret casts (+tests)
  • array accessors (+tests)
  • intrinsics (+tests)
  • align vec256/512 stack slots on the ocaml stack
  • use vex encoding for sse intrinsics when avx is enabled (evex if avx512)
  • avx512 mask registers
  • fix asan checks for 32/64 byte ops (and unaligned 16 byte)

@TheNumbat TheNumbat added flambda2 Prerequisite for, or part of, flambda2 backend simd SIMD support labels May 5, 2025
@TheNumbat TheNumbat added the lambda Lambda language changes label May 7, 2025
@TheNumbat TheNumbat marked this pull request as ready for review May 20, 2025 21:04
@TheNumbat
Copy link
Member Author

TheNumbat commented May 22, 2025

Review:

  • @ccasin - typing, toplevel, testsuite, parsing, lambda, .github, flambda-backend
  • @mshinwell - middle_end + cmm
  • @xclerc - backend, tools, file_formats - cmm
  • @NickBarnes - runtime, runtime4

Copy link
Contributor

@ccasin ccasin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed the directories assigned to me and they look fine, modulo my comments above - mainly I think we need to document clearly somewhere all the bits that are missing before we can consider moving this out of beta.

@mshinwell
Copy link
Collaborator

The middle_end/ changes are ok, I pushed a few small fixes.

@mshinwell
Copy link
Collaborator

New review assignment: I will read the Cmm parts, @xclerc will read the remainder of the backend.

@mshinwell
Copy link
Collaborator

I've now read backend/*cmm*

@TheNumbat
Copy link
Member Author

TheNumbat commented Jun 3, 2025

Responded to review - now need to fix the asan job & some failure in stack checks + effects

@TheNumbat
Copy link
Member Author

TheNumbat commented Jun 3, 2025

The stack checks failure is due to #4078; also added a wrong asan case to fix in a following pr.

@TheNumbat TheNumbat removed the request for review from jvanburen June 4, 2025 21:55
Copy link
Contributor

@NickBarnes NickBarnes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only reviewed the runtime side of this. It's broadly OK; various minor suggestions. In addition to these, I found some of the macrology in amd64.S hard to follow, so have made a suggested improvement in a commit here: https://github.com/NickBarnes/oxcaml/tree/avx-macrology

Copy link
Contributor

@NickBarnes NickBarnes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime stuff looks good to me now.

@TheNumbat TheNumbat merged commit 0db41c3 into main Jun 13, 2025
31 checks passed
@TheNumbat TheNumbat deleted the vec256-512 branch June 13, 2025 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend flambda2 Prerequisite for, or part of, flambda2 lambda Lambda language changes simd SIMD support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants