Implement vec256/512 #3966

TheNumbat · 2025-05-05T00:20:29Z

Adds support for 256 and 512 bit vectors throughout the compiler. Generally has exactly the same semantics as 128 bit vectors, but larger. About half of the diff (+5k loc) is from enabling AVX/AVX2 in simdgen.

256-bit types are allowed with simd_beta. The basic tests for 128-bit have been duplicated for 256, but none of the operations in the other tests are implemented yet. AVX and AVX2 architecture extensions have been added and are enabled by default, since we already enable other Haswell extensions.
512-bit types are allowed with simd_alpha. They do not yet have tests and their usage will trigger a fatal error in emit. The AVX512F extension is also disabled by default.

I think the only changes that need careful review are those related to preserving/aligning wide registers (mainly amd64.S/emit):

caml_call_gc and related functions now have versions that save ymm/zmm registers. Emit chooses which one to call based on the live register set. Programs that do not use 256/512 bit vectors should see identical behavior.
When a C call includes vec256 or vec512 stack arguments, C compilers assume the stack will be 32/64 byte aligned upon entering the external call. Therefore we emit a call to caml_c_call_stack_args_avx{512}, which aligns the C stack before copying the arguments from the OCaml stack. We also do this in runtime4.

For later PRs:

constants (+tests)
static & reinterpret casts (+tests)
array accessors (+tests)
intrinsics (+tests)
align vec256/512 stack slots on the ocaml stack
use vex encoding for sse intrinsics when avx is enabled (evex if avx512)
avx512 mask registers
fix asan checks for 32/64 byte ops (and unaligned 16 byte)

backend/x86_binary_emitter.ml

TheNumbat · 2025-05-22T17:33:27Z

Review:

@ccasin - typing, toplevel, testsuite, parsing, lambda, .github, flambda-backend
@mshinwell - middle_end + cmm
@xclerc - backend, tools, file_formats - cmm
@NickBarnes - runtime, runtime4

ccasin

I've reviewed the directories assigned to me and they look fine, modulo my comments above - mainly I think we need to document clearly somewhere all the bits that are missing before we can consider moving this out of beta.

flambda-backend/tests/simd/basic256.ml

typing/typedecl.ml

mshinwell · 2025-06-02T13:12:52Z

The middle_end/ changes are ok, I pushed a few small fixes.

mshinwell · 2025-06-02T15:00:31Z

New review assignment: I will read the Cmm parts, @xclerc will read the remainder of the backend.

mshinwell · 2025-06-02T15:25:38Z

I've now read backend/*cmm*

TheNumbat · 2025-06-03T00:29:31Z

Responded to review - now need to fix the asan job & some failure in stack checks + effects

TheNumbat · 2025-06-03T19:40:34Z

The stack checks failure is due to #4078; also added a wrong asan case to fix in a following pr.

backend/cmmgen_state.ml

NickBarnes

I've only reviewed the runtime side of this. It's broadly OK; various minor suggestions. In addition to these, I found some of the macrology in amd64.S hard to follow, so have made a suggested improvement in a commit here: https://github.com/NickBarnes/oxcaml/tree/avx-macrology

runtime/caml/simd.h

runtime/simd.c

runtime4/startup_nat.c

runtime/amd64.S

NickBarnes

The runtime stuff looks good to me now.

Implement vec256/512 --------- Co-authored-by: Mark Shinwell <[email protected]> Co-authored-by: Nick Barnes <[email protected]>

TheNumbat added flambda2 Prerequisite for, or part of, flambda2 backend simd SIMD support labels May 5, 2025

TheNumbat force-pushed the vec256-512 branch from 0b58fe6 to ee745db Compare May 7, 2025 02:20

TheNumbat added the lambda Lambda language changes label May 7, 2025

TheNumbat force-pushed the vec256-512 branch from 74cb7ab to ae5309b Compare May 20, 2025 17:44

TheNumbat marked this pull request as ready for review May 20, 2025 21:04

TheNumbat commented May 20, 2025

View reviewed changes

backend/x86_binary_emitter.ml Outdated Show resolved Hide resolved

TheNumbat force-pushed the vec256-512 branch from c73077d to c6c0e8e Compare May 20, 2025 21:45

TheNumbat requested review from ccasin, mshinwell, NickBarnes and jvanburen May 22, 2025 17:29

squash

3109b10

TheNumbat force-pushed the vec256-512 branch from dff8619 to 3109b10 Compare May 22, 2025 17:44

TheNumbat added 5 commits May 23, 2025 12:16

merge

e8df6c7

extra check

d3030dc

merge

f7259f1

merge

b837b87

fix

ce3905f

TheNumbat mentioned this pull request May 30, 2025

Fix binary emitter VEX encoding bug #4069

Merged

revert stubs.c reformat

a955a9c

ccasin reviewed May 30, 2025

View reviewed changes

flambda-backend/tests/simd/basic256.ml Outdated Show resolved Hide resolved

typing/typedecl.ml Show resolved Hide resolved

Code review for middle_end/

6e63708

mshinwell added 2 commits June 2, 2025 16:12

may -> might

3b70998

CR

9a54f25

TheNumbat added 3 commits June 2, 2025 19:48

mixed block generated tests

d12e5b1

review

eaed60a

amd64

d73c9ed

TheNumbat added 6 commits June 2, 2025 20:34

add comment

737e513

array compare

2a62e08

fix

94d0888

fix gc_regs issue

778083e

consistency

384857a

stub asan

380b860

TheNumbat removed the request for review from jvanburen June 4, 2025 21:55

xclerc reviewed Jun 5, 2025

View reviewed changes

backend/cmmgen_state.ml Outdated Show resolved Hide resolved

TheNumbat added 3 commits June 5, 2025 10:17

fix gap

f3907be

merge

407571f

format

6094ac2

NickBarnes requested changes Jun 11, 2025

View reviewed changes

TheNumbat and others added 7 commits June 12, 2025 13:48

merge

14a92d7

Improve assembler macrology, hopefully clarifying it somewhat.

eb5d6b8

review

1493761

port macros to rt4

efc3fd2

comment

ab605bb

fix fp build

6b82c5c

revert

5f6120c

NickBarnes approved these changes Jun 13, 2025

View reviewed changes

ccasin approved these changes Jun 13, 2025

View reviewed changes

TheNumbat merged commit 0db41c3 into main Jun 13, 2025
31 checks passed

TheNumbat deleted the vec256-512 branch June 13, 2025 15:55

gretay-js mentioned this pull request Jun 16, 2025

(Arm64) implement simd intrinsics and enable simd tests #4140

Merged

Dreian pushed a commit to Dreian/oxcaml that referenced this pull request Jul 10, 2025

Implement vec256/512 (oxcaml#3966)

d123a9c

Implement vec256/512 --------- Co-authored-by: Mark Shinwell <[email protected]> Co-authored-by: Nick Barnes <[email protected]>

Implement vec256/512 #3966

Implement vec256/512 #3966

Uh oh!

Conversation

TheNumbat commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

TheNumbat commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ccasin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mshinwell commented Jun 2, 2025

Uh oh!

mshinwell commented Jun 2, 2025

Uh oh!

mshinwell commented Jun 2, 2025

Uh oh!

TheNumbat commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheNumbat commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

NickBarnes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NickBarnes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

TheNumbat commented May 5, 2025 •

edited

Loading

TheNumbat commented May 22, 2025 •

edited

Loading

TheNumbat commented Jun 3, 2025 •

edited

Loading

TheNumbat commented Jun 3, 2025 •

edited

Loading