Skip to content

Conversation

@jatin-bhateja
Copy link
Member

@jatin-bhateja jatin-bhateja commented Aug 22, 2025

Hi All,

This patch extends VectorAPI inline expanders to infer Float16 vector IR based on the newly passed operType argument.
We intend to leverage the existing IR and backend implementation of auto-vectorized Float16 operations.
Various HalffloatVector operators, namely ADD, SUB, MUL, DIV, MAX, MIN, and FMA, now emit FP16 ISA on x86 targets supporting AVX512-FP16 feature and AArch64 SVE targets.

Best Regards,
Jatin


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (1 review required, with at least 1 Committer)

Issue

  • JDK-8365967: C2 compiler support for HalffloatVector operations supported by auto-vectorization flow (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/panama-vector.git pull/231/head:pull/231
$ git checkout pull/231

Update a local copy of the PR:
$ git checkout pull/231
$ git pull https://git.openjdk.org/panama-vector.git pull/231/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 231

View PR using the GUI difftool:
$ git pr show -t 231

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/panama-vector/pull/231.diff

Using Webrev

Link to Webrev Comment

@jatin-bhateja jatin-bhateja marked this pull request as ready for review August 22, 2025 17:39
@jatin-bhateja jatin-bhateja marked this pull request as draft August 22, 2025 17:40
@jatin-bhateja jatin-bhateja changed the base branch from vectorIntrinsics to vectorIntrinsics+fp16 August 22, 2025 17:42
@jatin-bhateja jatin-bhateja changed the base branch from vectorIntrinsics+fp16 to vectorIntrinsics August 22, 2025 17:43
@bridgekeeper
Copy link

bridgekeeper bot commented Aug 22, 2025

👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into vectorIntrinsics will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Aug 22, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@jatin-bhateja
Copy link
Member Author

jatin-bhateja commented Aug 25, 2025

Performance of the FMA benchmark on Intel Xeon Emerald Rapids : INTEL(R) XEON(R) PLATINUM 8581C CPU @ 2.30GHz
image

@jatin-bhateja jatin-bhateja changed the title 8365967: C2 compiler support for HalffloatVector operations suppored by auto-vectorization flow 8365967: C2 compiler support for HalffloatVector operations supported by auto-vectorization flow Aug 26, 2025
@jatin-bhateja jatin-bhateja mentioned this pull request Aug 28, 2025
2 tasks
@openjdk
Copy link

openjdk bot commented Aug 29, 2025

⚠️ @jatin-bhateja This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

@jatin-bhateja jatin-bhateja changed the base branch from vectorIntrinsics to vectorIntrinsics+fp16 August 29, 2025 12:05
@jatin-bhateja jatin-bhateja marked this pull request as ready for review August 29, 2025 12:06
@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 29, 2025
@mlbridge
Copy link

mlbridge bot commented Aug 29, 2025

Webrevs

@jatin-bhateja
Copy link
Member Author

What is remaining?

  • Functional validation
  • Performance validation
  • New IR framework-based tests.
  • Microbenchmark for FP16-based dotproduct.

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 30, 2025

@jatin-bhateja This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@openjdk
Copy link

openjdk bot commented Oct 1, 2025

@jatin-bhateja Unknown command keeplive - for a list of valid commands use /help.

@jatin-bhateja
Copy link
Member Author

/keepalive

@openjdk
Copy link

openjdk bot commented Oct 1, 2025

@jatin-bhateja The pull request is being re-evaluated and the inactivity timeout has been reset.

@jatin-bhateja
Copy link
Member Author

jatin-bhateja commented Oct 2, 2025

Performance of JMH micros
System: Model name: INTEL(R) XEON(R) PLATINUM 8581C CPU @ 2.10GHz

Baseline:
Benchmark                      (size)   Mode  Cnt      Score   Error   Units
Halffloat256Vector.ABS           1024  thrpt    2    366.995          ops/ms
Halffloat256Vector.ABSMasked     1024  thrpt    2    345.584          ops/ms
Halffloat256Vector.ACOS          1024  thrpt    2     61.402          ops/ms
Halffloat256Vector.ADD           1024  thrpt    2    259.029          ops/ms
Halffloat256Vector.ADDMasked     1024  thrpt    2    251.257          ops/ms
Halffloat256Vector.ASIN          1024  thrpt    2     61.191          ops/ms
Halffloat256Vector.ATAN          1024  thrpt    2     40.815          ops/ms
Halffloat256Vector.ATAN2         1024  thrpt    2     28.224          ops/ms
Halffloat256Vector.CBRT          1024  thrpt    2     43.547          ops/ms
Halffloat256Vector.COS           1024  thrpt    2     37.414          ops/ms
Halffloat256Vector.COSH          1024  thrpt    2     46.365          ops/ms
Halffloat256Vector.DIV           1024  thrpt    2    221.924          ops/ms
Halffloat256Vector.DIVMasked     1024  thrpt    2    240.560          ops/ms
Halffloat256Vector.EXP           1024  thrpt    2     52.344          ops/ms
Halffloat256Vector.EXPM1         1024  thrpt    2     48.346          ops/ms
Halffloat256Vector.FMA           1024  thrpt    2    206.324          ops/ms
Halffloat256Vector.FMAMasked     1024  thrpt    2    184.678          ops/ms
Halffloat256Vector.HYPOT         1024  thrpt    2     34.096          ops/ms
Halffloat256Vector.LOG           1024  thrpt    2     40.300          ops/ms
Halffloat256Vector.LOG10         1024  thrpt    2     38.886          ops/ms
Halffloat256Vector.LOG1P         1024  thrpt    2     36.438          ops/ms
Halffloat256Vector.MAX           1024  thrpt    2    266.337          ops/ms
Halffloat256Vector.MAXMasked     1024  thrpt    2    245.518          ops/ms
Halffloat256Vector.MIN           1024  thrpt    2    268.963          ops/ms
Halffloat256Vector.MINMasked     1024  thrpt    2    243.136          ops/ms
Halffloat256Vector.MUL           1024  thrpt    2    264.127          ops/ms
Halffloat256Vector.MULMasked     1024  thrpt    2    251.600          ops/ms
Halffloat256Vector.NEG           1024  thrpt    2    365.486          ops/ms
Halffloat256Vector.NEGMasked     1024  thrpt    2    357.070          ops/ms
Halffloat256Vector.POW           1024  thrpt    2     26.809          ops/ms
Halffloat256Vector.SIN           1024  thrpt    2     34.555          ops/ms
Halffloat256Vector.SINH          1024  thrpt    2     53.779          ops/ms
Halffloat256Vector.SQRT          1024  thrpt    2    130.811          ops/ms
Halffloat256Vector.SQRTMasked    1024  thrpt    2    192.628          ops/ms
Halffloat256Vector.SUB           1024  thrpt    2    262.521          ops/ms
Halffloat256Vector.SUBMasked     1024  thrpt    2    254.578          ops/ms
Halffloat256Vector.TAN           1024  thrpt    2     30.002          ops/ms
Halffloat256Vector.TANH          1024  thrpt    2     55.562          ops/ms
Halffloat256Vector.blend         1024  thrpt    2  28002.356          ops/ms

Withopt:-
Benchmark                      (size)   Mode  Cnt      Score   Error   Units
Halffloat256Vector.ABS           1024  thrpt    2  24048.638          ops/ms
Halffloat256Vector.ABSMasked     1024  thrpt    2  45085.707          ops/ms
Halffloat256Vector.ACOS          1024  thrpt    2     56.116          ops/ms
Halffloat256Vector.ADD           1024  thrpt    2  19623.250          ops/ms
Halffloat256Vector.ADDMasked     1024  thrpt    2  27462.171          ops/ms
Halffloat256Vector.ASIN          1024  thrpt    2     62.081          ops/ms
Halffloat256Vector.ATAN          1024  thrpt    2     41.352          ops/ms
Halffloat256Vector.ATAN2         1024  thrpt    2     29.173          ops/ms
Halffloat256Vector.CBRT          1024  thrpt    2     39.926          ops/ms
Halffloat256Vector.COS           1024  thrpt    2     37.151          ops/ms
Halffloat256Vector.COSH          1024  thrpt    2     48.309          ops/ms
Halffloat256Vector.DIV           1024  thrpt    2   2805.701          ops/ms
Halffloat256Vector.DIVMasked     1024  thrpt    2   2795.544          ops/ms
Halffloat256Vector.EXP           1024  thrpt    2     55.055          ops/ms
Halffloat256Vector.EXPM1         1024  thrpt    2     50.483          ops/ms
Halffloat256Vector.FMA           1024  thrpt    2  23280.064          ops/ms
Halffloat256Vector.FMAMasked     1024  thrpt    2  21828.932          ops/ms
Halffloat256Vector.HYPOT         1024  thrpt    2     34.266          ops/ms
Halffloat256Vector.LOG           1024  thrpt    2     42.158          ops/ms
Halffloat256Vector.LOG10         1024  thrpt    2     41.335          ops/ms
Halffloat256Vector.LOG1P         1024  thrpt    2     36.291          ops/ms
Halffloat256Vector.MAX           1024  thrpt    2  14960.348          ops/ms
Halffloat256Vector.MAXMasked     1024  thrpt    2  12585.642          ops/ms
Halffloat256Vector.MIN           1024  thrpt    2  14662.769          ops/ms
Halffloat256Vector.MINMasked     1024  thrpt    2  12327.769          ops/ms
Halffloat256Vector.MUL           1024  thrpt    2  27156.965          ops/ms
Halffloat256Vector.MULMasked     1024  thrpt    2  21349.555          ops/ms
Halffloat256Vector.NEG           1024  thrpt    2  24093.711          ops/ms
Halffloat256Vector.NEGMasked     1024  thrpt    2  26889.264          ops/ms
Halffloat256Vector.POW           1024  thrpt    2     27.028          ops/ms
Halffloat256Vector.SIN           1024  thrpt    2     34.280          ops/ms
Halffloat256Vector.SINH          1024  thrpt    2     55.049          ops/ms
Halffloat256Vector.SQRT          1024  thrpt    2   2491.596          ops/ms
Halffloat256Vector.SQRTMasked    1024  thrpt    2   2493.591          ops/ms
Halffloat256Vector.SUB           1024  thrpt    2  29664.499          ops/ms
Halffloat256Vector.SUBMasked     1024  thrpt    2  25384.305          ops/ms
Halffloat256Vector.TAN           1024  thrpt    2     29.754          ops/ms
Halffloat256Vector.TANH          1024  thrpt    2     55.933          ops/ms
Halffloat256Vector.blend         1024  thrpt    2  22681.727          ops/ms

What is remaining?

Functional validation
Through performance validation
New IR framework-based tests.
Microbenchmark for FP16-based dotproduct.

…are sufficient for inline expansion, fixed selectFromTwoVector fallback
@jatin-bhateja
Copy link
Member Author

jatin-bhateja commented Nov 7, 2025

Integrating this PR, the remaining work will be part of JDK-mainline PR pull/28002

@openjdk
Copy link

openjdk bot commented Nov 7, 2025

@jatin-bhateja This pull request has not yet been marked as ready for integration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

2 participants