Skip to content

8353786: Migrate Vector API math library support to FFM API #24462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 29 commits into from

Conversation

iwanowww
Copy link
Contributor

@iwanowww iwanowww commented Apr 4, 2025

Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API.

Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side.

The patch consists of the following parts:

  • on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup;
  • 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations);
  • new internal interface to query supported CPU ISA extensions (jdk.incubator.vector.CPUFeatures) used for CPU dispatching.

java.lang.foreign API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code.

Once java.lang.foreign supports vectors & vector calling conventions, VM intrinsics can go away.

Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64).

Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64)

Thanks!


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8353786: Migrate Vector API math library support to FFM API (Enhancement - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24462/head:pull/24462
$ git checkout pull/24462

Update a local copy of the PR:
$ git checkout pull/24462
$ git pull https://git.openjdk.org/jdk.git pull/24462/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24462

View PR using the GUI difftool:
$ git pr show -t 24462

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24462.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 4, 2025

👋 Welcome back vlivanov! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 4, 2025

@iwanowww This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8353786: Migrate Vector API math library support to FFM API

Reviewed-by: jbhateja, kvn, psandoz, xgong, jvernee, mli

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 144 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented Apr 4, 2025

@iwanowww
The core-libs label was successfully added.

The hotspot-compiler label was successfully added.

@liach
Copy link
Member

liach commented Apr 4, 2025

Moving vector API library selection to Java code looks like a right step to me.

@iwanowww iwanowww force-pushed the vector.math.01.java branch from 03c5a8c to fc27aee Compare April 5, 2025 01:30
@iwanowww iwanowww changed the title XXXXXXX: ??? 8353786: Migrate Vector API math library support to FFM API Apr 5, 2025
@iwanowww
Copy link
Contributor Author

iwanowww commented Apr 5, 2025

/cc hotspot

@openjdk
Copy link

openjdk bot commented Apr 5, 2025

@iwanowww
The hotspot label was successfully added.

@iwanowww iwanowww marked this pull request as ready for review April 5, 2025 02:38
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 5, 2025
@mlbridge
Copy link

mlbridge bot commented Apr 5, 2025

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few questions?

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Apr 17, 2025
Copy link

@XiaohongGong XiaohongGong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for your updating!

Copy link
Member

@JornVernee JornVernee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting! Looks mostly good to me. Left a few inline notes.

Comment on lines +376 to +379
<V extends VectorPayload, E>
V libraryBinaryOp(long addr, Class<? extends V> vClass, Class<E> eClass, int length, String debugName,
V v1, V v2,
BinaryOperation<V,?> defaultImpl) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that the bound of V differs between libraryUnaryOp, which uses Vectory<E> and this method, which uses VectorPayload. Not sure if this is intentional?

Comment on lines 621 to 622
ThreadToNativeFromVM ttn(thread);
return env->NewStringUTF(features_string);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a way to do this without the extra transition?

How about:

  oop result = java_lang_String::create_oop_from_str((char*) bytes, CHECK_NULL);
  return (jstring) JNIHandles::make_local(THREAD, result);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.


@ForceInline
/*package-private*/ static
<E, V extends Vector<E>>
Copy link
Member

@JornVernee JornVernee Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you're using Vector instead of VectorPayload for the binary op, so there seems to be a discrepancy with VectorSupport.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong preference, but I kept it aligned with unaryOp/binaryOp intrinsics.

try {
MemorySegment addr = LOOKUP.findOrThrow(symbol);
debug("%s %s => 0x%016x\n", op, symbol, addr.address());
T impl = implSupplier.apply(opc); // TODO: should call the very same native implementation eventually (once FFM API supports vectors)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, one current barrier I see to implementing the vector calling convention in the linker, is that the FFM linker (currently) transmits register values to the downcall stub use Java primitive types. So, in order to support vector calling conventions, we would need to add some kind of 'primitive' that can hold the entire vector value, and preferably gets passed in the right register.

However, I think in the case of these math libraries in particular, speed of the fallback implementation is not that much of an issue, since there is also an intrinsic. So alternatively, we could split a vector value up into smaller integral types (int, long) -> pass them to the downcall stub in that form -> and then reconstruct the full vector value in its target register. (I used the same trick when I was experimenting with FP80 support, which also requires splitting up the 80 bit value up into 2 longs).

Copy link
Contributor Author

@iwanowww iwanowww Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO an in-memory representation for vectors is preferred when it comes to FFM linker calling conventions. 512-bit vector requires 8 longs, so some of them will end up passed on stack for any non-trivial case. And with in-memory representation, VM can elide vector store/load once FFM linker stub is intrinsified.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 18, 2025
static String getDefaultName() {
return switch (StaticProperty.osArch()) {
case "amd64", "x86_64" -> SVML;
case "aarch64" -> SLEEF;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be supporting SLEEF on riscv64. Was there a specific motivation not to include it here?

Copy link
Contributor Author

@iwanowww iwanowww Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed.

Copy link

@Hamlin-Li Hamlin-Li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just ran some basic tests on riscv, seems there are some issues, also have some comments.

public static Set<String> features() {
return features;
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an extra line needed at the end of this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

size_t features_offset) {
assert(features_offset <= cpu_info_string_len, "");
if (features_offset < cpu_info_string_len) {
assert(cpu_info_string[features_offset + 0] == ',', "");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert fails on riscv.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple fix could be:

diff --git a/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp b/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp
index 484a2a645aa..a785dc65c9e 100644
--- a/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp
+++ b/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp
@@ -196,25 +196,12 @@ void VM_Version::setup_cpu_available_features() {
 
   _cpu_info_string = os::strdup(buf);
 
-  _features_string = extract_features_string(_cpu_info_string,
-                                             strnlen(_cpu_info_string, sizeof(buf)),
-                                             features_offset);
+  _features_string = _cpu_info_string;
 }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, it's fine for now to completely drop extract_features_string call on linux-riscv (as on some other platforms) and fix it separately. Then VectorSupport.getCPUFeatures() returns empty string. VectorMathLibrary doesn't rely on CPUFeatures on RISC-V.

Let me know how you prefer to handle it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need this or similar thing on riscv. Please check my new comments below about rvv extension on riscv.
On the other hand, it's also good to have it on riscv for consistency, and there is a log output of "cpu features" in VectorMathLibrary.java

private static Set<String> getCPUFeatures() {
String featuresString = VectorSupport.getCPUFeatures();
debug(featuresString);
String[] features = featuresString.toLowerCase(Locale.ROOT).split(", "); // ", " is used as a delimiter

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On riscv, it's splitted by " ", for the fix please refer to CPUInfo.java in test.

@@ -58,6 +58,8 @@ class Abstract_VM_Version: AllStatic {
static uint64_t _features;
static const char* _features_string;

static const char* _cpu_info_string;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure the reason to introduce _cpu_info_string.
Seems to me you could just use _features_string, and remove _cpu_info_string and its related code, e.g. extract_features_string. Please check the code in test/lib/jdk/test/whitebox/cpuinfo/CPUInfo.java

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mayber in CPUFeatures, could use the similar code as CPUInfo to split the cpu string into cpu features?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is to align _features_string with _features which enumerates well-known CPU capabilities JVM manages. As of now, _features_string contains more information, so I introduced _cpu_info_string to keep it.

Speaking of test/lib/jdk/test/whitebox/cpuinfo/CPUInfo.java, the approach chosen there may be fine for a test library, but we need a more stable API between JVM and JDK.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this.
But it might be better to change the spliting regex of _features_string in CPUFeatures.java to support riscv cpu features format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I pushed an update. Let me know what you think about it.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Apr 23, 2025
Copy link

@Hamlin-Li Hamlin-Li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found another possible issue on riscv.

V unaryMathOp(Unary op, int opc, VectorSpecies<E> vspecies,
IntFunction<VectorSupport.UnaryOperation<V,?>> implSupplier,
V v) {
var entry = lookup(op, opc, vspecies, implSupplier);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems there is another issue for riscv here.
If the rvv extension is not supported on the running machine, it will still generate the code using rvv, this should lead to a crash at runtime?

Copy link

@Hamlin-Li Hamlin-Li Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In previous code, we use UseRVV to detect if rvv extension is supported.

On the other hand, user can choose to disable UseRVV if they want even if rvv extension is supported on the running machine. In this sense, there could be similar issue on other platforms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the following check catch UseRVV == false case on RISC-V?

public boolean isSupported(Operator op, VectorSpecies<?> vspecies) {
...
            int maxLaneCount = VectorSupport.getMaxLaneCount(vspecies.elementType());
            if (vspecies.length() > maxLaneCount) {
                return false; // lacking vector support
            }
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR both VectorSupport.getMaxLaneCount() and CPUFeatures don't rely on raw list of ISA extensions CPU supports, but only those reported by the JVM. So, if some feature support is disabled on JVM side, it won't be reported by VM_Version and, hence, CPUFeatures.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating! Looks good for riscv. I have ran some basic tests for vector API, passed. I did not ran benchmark, as riscv & aarch64 share the same way to bridge from java to sleef.

Does the following check catch UseRVV == false case on RISC-V?

Yes. If you don't mind, an explicit comment might be helpful. As to me "lacking vector support" here means the vector length is not large enough, but it's quite subjective, so you are on the call.

FTR both VectorSupport.getMaxLaneCount() and CPUFeatures don't rely on raw list of ISA extensions CPU supports, but only those reported by the JVM. So, if some feature support is disabled on JVM side, it won't be reported by VM_Version and, hence, CPUFeatures.

I'm fine with this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I added some clarifications in the comments.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 24, 2025
char* buf = NEW_ARENA_ARRAY(C->comp_arena(), char, buflen);
debug_name = debug_name_oop->const_oop()->as_instance()->java_lang_String_str(buf, buflen);
}
Node* vcall = make_runtime_call(RC_VECTOR,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By generating an upfront CallLeafVectorNode, we may miss out on performing any GVN-style optimization for trigonometric identities like the following. do you think creating a macro node which can lazily be expanded to call node during macro expansion will help.

arcsin(sin(x)) => x
arccos(cos(x)) => x
sin(arcsin(x) => x
cos(arccos(x) => x

Copy link
Contributor Author

@iwanowww iwanowww Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does look attractive, but macro expansion-based solution requires JVM to internalize such operations and their properties.

IMO a higher-level solution based on more generic JVM primitives would enable libraries to properly annotate their operations in Java bytecodes/class files, so C2 can perform such type of transformations without the need to intrinsify each individual operation first. (Think of JDK-8218414 / JDK-8347901 on steroids.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is a typical graph transform which cannot be applied currently because we are generating CallLeafVectorNode upfront during parsing, If we prevent intrinsification then compiler will attempt inlining, generating a much complex graph shape which may not be reducible.

Copy link
Contributor Author

@iwanowww iwanowww Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any insurmountable problems performing such transformations on chains of CallLeafVector nodes (or any other call nodes). But the missing piece is information about the algebraic properties of native functions JVM can't derive on its own.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Apr 24, 2025
Copy link
Member

@jatin-bhateja jatin-bhateja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
Best Regards

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 25, 2025
@iwanowww
Copy link
Contributor Author

Thanks for the reviews!

/integrate

@openjdk
Copy link

openjdk bot commented Apr 25, 2025

Going to push as commit e57fd71.
Since your change was applied there have been 146 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Apr 25, 2025
@openjdk openjdk bot closed this Apr 25, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 25, 2025
@openjdk
Copy link

openjdk bot commented Apr 25, 2025

@iwanowww Pushed as commit e57fd71.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.