8365967: C2 compiler support for HalffloatVector operations supported by auto-vectorization flow #231

jatin-bhateja · 2025-08-22T17:39:18Z

Hi All,

This patch extends VectorAPI inline expanders to infer Float16 vector IR based on the newly passed operType argument.
We intend to leverage the existing IR and backend implementation of auto-vectorized Float16 operations.
Various HalffloatVector operators, namely ADD, SUB, MUL, DIV, MAX, MIN, and FMA, now emit FP16 ISA on x86 targets supporting AVX512-FP16 feature and AArch64 SVE targets.

Best Regards,
Jatin

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed (1 review required, with at least 1 Committer)

Issue

JDK-8365967: C2 compiler support for HalffloatVector operations supported by auto-vectorization flow (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/panama-vector.git pull/231/head:pull/231
$ git checkout pull/231

Update a local copy of the PR:
$ git checkout pull/231
$ git pull https://git.openjdk.org/panama-vector.git pull/231/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 231

View PR using the GUI difftool:
$ git pr show -t 231

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/panama-vector/pull/231.diff

Using Webrev

Link to Webrev Comment

…by auto-vectorization flow

bridgekeeper · 2025-08-22T17:47:18Z

👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into vectorIntrinsics will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-08-22T17:47:46Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

jatin-bhateja · 2025-08-25T13:47:29Z

Performance of the FMA benchmark on Intel Xeon Emerald Rapids : INTEL(R) XEON(R) PLATINUM 8581C CPU @ 2.30GHz

…ama-vector into JDK-8365967

openjdk · 2025-08-29T12:05:20Z

⚠️ @jatin-bhateja This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

mlbridge · 2025-08-29T12:10:39Z

Webrevs

jatin-bhateja · 2025-08-29T12:13:12Z

What is remaining?

Functional validation
Performance validation
New IR framework-based tests.
Microbenchmark for FP16-based dotproduct.

…ama-vector into JDK-8365967

bridgekeeper · 2025-09-30T22:49:46Z

@jatin-bhateja This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

openjdk · 2025-10-01T22:37:34Z

@jatin-bhateja Unknown command keeplive - for a list of valid commands use /help.

jatin-bhateja · 2025-10-01T22:39:50Z

/keepalive

openjdk · 2025-10-01T22:41:25Z

@jatin-bhateja The pull request is being re-evaluated and the inactivity timeout has been reset.

…ama-vector into JDK-8365967

jatin-bhateja · 2025-10-02T04:44:33Z

Performance of JMH micros
System: Model name: INTEL(R) XEON(R) PLATINUM 8581C CPU @ 2.10GHz

Baseline:
Benchmark                      (size)   Mode  Cnt      Score   Error   Units
Halffloat256Vector.ABS           1024  thrpt    2    366.995          ops/ms
Halffloat256Vector.ABSMasked     1024  thrpt    2    345.584          ops/ms
Halffloat256Vector.ACOS          1024  thrpt    2     61.402          ops/ms
Halffloat256Vector.ADD           1024  thrpt    2    259.029          ops/ms
Halffloat256Vector.ADDMasked     1024  thrpt    2    251.257          ops/ms
Halffloat256Vector.ASIN          1024  thrpt    2     61.191          ops/ms
Halffloat256Vector.ATAN          1024  thrpt    2     40.815          ops/ms
Halffloat256Vector.ATAN2         1024  thrpt    2     28.224          ops/ms
Halffloat256Vector.CBRT          1024  thrpt    2     43.547          ops/ms
Halffloat256Vector.COS           1024  thrpt    2     37.414          ops/ms
Halffloat256Vector.COSH          1024  thrpt    2     46.365          ops/ms
Halffloat256Vector.DIV           1024  thrpt    2    221.924          ops/ms
Halffloat256Vector.DIVMasked     1024  thrpt    2    240.560          ops/ms
Halffloat256Vector.EXP           1024  thrpt    2     52.344          ops/ms
Halffloat256Vector.EXPM1         1024  thrpt    2     48.346          ops/ms
Halffloat256Vector.FMA           1024  thrpt    2    206.324          ops/ms
Halffloat256Vector.FMAMasked     1024  thrpt    2    184.678          ops/ms
Halffloat256Vector.HYPOT         1024  thrpt    2     34.096          ops/ms
Halffloat256Vector.LOG           1024  thrpt    2     40.300          ops/ms
Halffloat256Vector.LOG10         1024  thrpt    2     38.886          ops/ms
Halffloat256Vector.LOG1P         1024  thrpt    2     36.438          ops/ms
Halffloat256Vector.MAX           1024  thrpt    2    266.337          ops/ms
Halffloat256Vector.MAXMasked     1024  thrpt    2    245.518          ops/ms
Halffloat256Vector.MIN           1024  thrpt    2    268.963          ops/ms
Halffloat256Vector.MINMasked     1024  thrpt    2    243.136          ops/ms
Halffloat256Vector.MUL           1024  thrpt    2    264.127          ops/ms
Halffloat256Vector.MULMasked     1024  thrpt    2    251.600          ops/ms
Halffloat256Vector.NEG           1024  thrpt    2    365.486          ops/ms
Halffloat256Vector.NEGMasked     1024  thrpt    2    357.070          ops/ms
Halffloat256Vector.POW           1024  thrpt    2     26.809          ops/ms
Halffloat256Vector.SIN           1024  thrpt    2     34.555          ops/ms
Halffloat256Vector.SINH          1024  thrpt    2     53.779          ops/ms
Halffloat256Vector.SQRT          1024  thrpt    2    130.811          ops/ms
Halffloat256Vector.SQRTMasked    1024  thrpt    2    192.628          ops/ms
Halffloat256Vector.SUB           1024  thrpt    2    262.521          ops/ms
Halffloat256Vector.SUBMasked     1024  thrpt    2    254.578          ops/ms
Halffloat256Vector.TAN           1024  thrpt    2     30.002          ops/ms
Halffloat256Vector.TANH          1024  thrpt    2     55.562          ops/ms
Halffloat256Vector.blend         1024  thrpt    2  28002.356          ops/ms

Withopt:-
Benchmark                      (size)   Mode  Cnt      Score   Error   Units
Halffloat256Vector.ABS           1024  thrpt    2  24048.638          ops/ms
Halffloat256Vector.ABSMasked     1024  thrpt    2  45085.707          ops/ms
Halffloat256Vector.ACOS          1024  thrpt    2     56.116          ops/ms
Halffloat256Vector.ADD           1024  thrpt    2  19623.250          ops/ms
Halffloat256Vector.ADDMasked     1024  thrpt    2  27462.171          ops/ms
Halffloat256Vector.ASIN          1024  thrpt    2     62.081          ops/ms
Halffloat256Vector.ATAN          1024  thrpt    2     41.352          ops/ms
Halffloat256Vector.ATAN2         1024  thrpt    2     29.173          ops/ms
Halffloat256Vector.CBRT          1024  thrpt    2     39.926          ops/ms
Halffloat256Vector.COS           1024  thrpt    2     37.151          ops/ms
Halffloat256Vector.COSH          1024  thrpt    2     48.309          ops/ms
Halffloat256Vector.DIV           1024  thrpt    2   2805.701          ops/ms
Halffloat256Vector.DIVMasked     1024  thrpt    2   2795.544          ops/ms
Halffloat256Vector.EXP           1024  thrpt    2     55.055          ops/ms
Halffloat256Vector.EXPM1         1024  thrpt    2     50.483          ops/ms
Halffloat256Vector.FMA           1024  thrpt    2  23280.064          ops/ms
Halffloat256Vector.FMAMasked     1024  thrpt    2  21828.932          ops/ms
Halffloat256Vector.HYPOT         1024  thrpt    2     34.266          ops/ms
Halffloat256Vector.LOG           1024  thrpt    2     42.158          ops/ms
Halffloat256Vector.LOG10         1024  thrpt    2     41.335          ops/ms
Halffloat256Vector.LOG1P         1024  thrpt    2     36.291          ops/ms
Halffloat256Vector.MAX           1024  thrpt    2  14960.348          ops/ms
Halffloat256Vector.MAXMasked     1024  thrpt    2  12585.642          ops/ms
Halffloat256Vector.MIN           1024  thrpt    2  14662.769          ops/ms
Halffloat256Vector.MINMasked     1024  thrpt    2  12327.769          ops/ms
Halffloat256Vector.MUL           1024  thrpt    2  27156.965          ops/ms
Halffloat256Vector.MULMasked     1024  thrpt    2  21349.555          ops/ms
Halffloat256Vector.NEG           1024  thrpt    2  24093.711          ops/ms
Halffloat256Vector.NEGMasked     1024  thrpt    2  26889.264          ops/ms
Halffloat256Vector.POW           1024  thrpt    2     27.028          ops/ms
Halffloat256Vector.SIN           1024  thrpt    2     34.280          ops/ms
Halffloat256Vector.SINH          1024  thrpt    2     55.049          ops/ms
Halffloat256Vector.SQRT          1024  thrpt    2   2491.596          ops/ms
Halffloat256Vector.SQRTMasked    1024  thrpt    2   2493.591          ops/ms
Halffloat256Vector.SUB           1024  thrpt    2  29664.499          ops/ms
Halffloat256Vector.SUBMasked     1024  thrpt    2  25384.305          ops/ms
Halffloat256Vector.TAN           1024  thrpt    2     29.754          ops/ms
Halffloat256Vector.TANH          1024  thrpt    2     55.933          ops/ms
Halffloat256Vector.blend         1024  thrpt    2  22681.727          ops/ms

What is remaining?

Functional validation
Through performance validation
New IR framework-based tests.
Microbenchmark for FP16-based dotproduct.

…are sufficient for inline expansion, fixed selectFromTwoVector fallback

jatin-bhateja · 2025-11-07T10:12:50Z

Integrating this PR, the remaining work will be part of JDK-mainline PR pull/28002

openjdk · 2025-11-07T10:14:09Z

@jatin-bhateja This pull request has not yet been marked as ready for integration.

8365967: C2 compiler support for HalffloatVector operations suppored …

7072408

…by auto-vectorization flow

jatin-bhateja marked this pull request as ready for review August 22, 2025 17:39

jatin-bhateja marked this pull request as draft August 22, 2025 17:40

jatin-bhateja changed the base branch from vectorIntrinsics to vectorIntrinsics+fp16 August 22, 2025 17:42

jatin-bhateja changed the base branch from vectorIntrinsics+fp16 to vectorIntrinsics August 22, 2025 17:43

jatin-bhateja changed the title ~~8365967: C2 compiler support for HalffloatVector operations suppored by auto-vectorization flow~~ 8365967: C2 compiler support for HalffloatVector operations supported by auto-vectorization flow Aug 26, 2025

jatin-bhateja mentioned this pull request Aug 28, 2025

Merge vectorIntrinsics #230

Closed

2 tasks

Merge branch 'vectorIntrinsics+fp16' of http://github.com/openjdk/pan…

9b792b3

…ama-vector into JDK-8365967

jatin-bhateja changed the base branch from vectorIntrinsics to vectorIntrinsics+fp16 August 29, 2025 12:05

jatin-bhateja marked this pull request as ready for review August 29, 2025 12:06

openjdk bot added the rfr Pull request is ready for review label Aug 29, 2025

jbhateja added 2 commits September 2, 2025 03:13

Merge branch 'vectorIntrinsics+fp16' of http://github.com/openjdk/pan…

cd4a08d

…ama-vector into JDK-8365967

Fix jtreg failures

dddc1c6

Merge branch 'vectorIntrinsics+fp16' of http://github.com/openjdk/pan…

0a258db

…ama-vector into JDK-8365967

Removing elemType from intrinsic interface, operType and carreirType …

7996288

…are sufficient for inline expansion, fixed selectFromTwoVector fallback

8365967: C2 compiler support for HalffloatVector operations supported by auto-vectorization flow #231

Are you sure you want to change the base?

8365967: C2 compiler support for HalffloatVector operations supported by auto-vectorization flow #231

Uh oh!

Conversation

jatin-bhateja commented Aug 22, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewing

Uh oh!

bridgekeeper bot commented Aug 22, 2025

Uh oh!

openjdk bot commented Aug 22, 2025

Uh oh!

jatin-bhateja commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk bot commented Aug 29, 2025

Uh oh!

mlbridge bot commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

jatin-bhateja commented Aug 29, 2025

Uh oh!

bridgekeeper bot commented Sep 30, 2025

Uh oh!

openjdk bot commented Oct 1, 2025

Uh oh!

jatin-bhateja commented Oct 1, 2025

Uh oh!

openjdk bot commented Oct 1, 2025

Uh oh!

jatin-bhateja commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jatin-bhateja commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk bot commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

jatin-bhateja commented Aug 22, 2025 •

edited by openjdk bot

Loading

jatin-bhateja commented Aug 25, 2025 •

edited

Loading

mlbridge bot commented Aug 29, 2025 •

edited

Loading

jatin-bhateja commented Oct 2, 2025 •

edited

Loading

jatin-bhateja commented Nov 7, 2025 •

edited

Loading