-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8353786: Migrate Vector API math library support to FFM API #24462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back vlivanov! A progress list of the required criteria for merging this PR into |
@iwanowww This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 144 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
@iwanowww The |
Moving vector API library selection to Java code looks like a right step to me. |
03c5a8c
to
fc27aee
Compare
/cc hotspot |
@iwanowww |
Webrevs
|
src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few questions?
src/jdk.incubator.vector/share/classes/jdk/incubator/vector/CPUFeatures.java
Show resolved
Hide resolved
src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java
Show resolved
Hide resolved
src/jdk.incubator.vector/share/classes/jdk/incubator/vector/CPUFeatures.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks for your updating!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very interesting! Looks mostly good to me. Left a few inline notes.
<V extends VectorPayload, E> | ||
V libraryBinaryOp(long addr, Class<? extends V> vClass, Class<E> eClass, int length, String debugName, | ||
V v1, V v2, | ||
BinaryOperation<V,?> defaultImpl) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice that the bound of V
differs between libraryUnaryOp
, which uses Vectory<E>
and this method, which uses VectorPayload
. Not sure if this is intentional?
ThreadToNativeFromVM ttn(thread); | ||
return env->NewStringUTF(features_string); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there a way to do this without the extra transition?
How about:
oop result = java_lang_String::create_oop_from_str((char*) bytes, CHECK_NULL);
return (jstring) JNIHandles::make_local(THREAD, result);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough.
|
||
@ForceInline | ||
/*package-private*/ static | ||
<E, V extends Vector<E>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you're using Vector
instead of VectorPayload
for the binary op, so there seems to be a discrepancy with VectorSupport
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong preference, but I kept it aligned with unaryOp
/binaryOp
intrinsics.
try { | ||
MemorySegment addr = LOOKUP.findOrThrow(symbol); | ||
debug("%s %s => 0x%016x\n", op, symbol, addr.address()); | ||
T impl = implSupplier.apply(opc); // TODO: should call the very same native implementation eventually (once FFM API supports vectors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, one current barrier I see to implementing the vector calling convention in the linker, is that the FFM linker (currently) transmits register values to the downcall stub use Java primitive types. So, in order to support vector calling conventions, we would need to add some kind of 'primitive' that can hold the entire vector value, and preferably gets passed in the right register.
However, I think in the case of these math libraries in particular, speed of the fallback implementation is not that much of an issue, since there is also an intrinsic. So alternatively, we could split a vector value up into smaller integral types (int
, long
) -> pass them to the downcall stub in that form -> and then reconstruct the full vector value in its target register. (I used the same trick when I was experimenting with FP80 support, which also requires splitting up the 80 bit value up into 2 long
s).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO an in-memory representation for vectors is preferred when it comes to FFM linker calling conventions. 512-bit vector requires 8 longs, so some of them will end up passed on stack for any non-trivial case. And with in-memory representation, VM can elide vector store/load once FFM linker stub is intrinsified.
static String getDefaultName() { | ||
return switch (StaticProperty.osArch()) { | ||
case "amd64", "x86_64" -> SVML; | ||
case "aarch64" -> SLEEF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be supporting SLEEF on riscv64
. Was there a specific motivation not to include it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just ran some basic tests on riscv, seems there are some issues, also have some comments.
public static Set<String> features() { | ||
return features; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe an extra line needed at the end of this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
size_t features_offset) { | ||
assert(features_offset <= cpu_info_string_len, ""); | ||
if (features_offset < cpu_info_string_len) { | ||
assert(cpu_info_string[features_offset + 0] == ',', ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assert fails on riscv.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simple fix could be:
diff --git a/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp b/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp
index 484a2a645aa..a785dc65c9e 100644
--- a/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp
+++ b/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp
@@ -196,25 +196,12 @@ void VM_Version::setup_cpu_available_features() {
_cpu_info_string = os::strdup(buf);
- _features_string = extract_features_string(_cpu_info_string,
- strnlen(_cpu_info_string, sizeof(buf)),
- features_offset);
+ _features_string = _cpu_info_string;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, it's fine for now to completely drop extract_features_string
call on linux-riscv (as on some other platforms) and fix it separately. Then VectorSupport.getCPUFeatures()
returns empty string. VectorMathLibrary
doesn't rely on CPUFeatures
on RISC-V.
Let me know how you prefer to handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need this or similar thing on riscv. Please check my new comments below about rvv extension on riscv.
On the other hand, it's also good to have it on riscv for consistency, and there is a log output of "cpu features" in VectorMathLibrary.java
private static Set<String> getCPUFeatures() { | ||
String featuresString = VectorSupport.getCPUFeatures(); | ||
debug(featuresString); | ||
String[] features = featuresString.toLowerCase(Locale.ROOT).split(", "); // ", " is used as a delimiter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On riscv, it's splitted by " ", for the fix please refer to CPUInfo.java in test.
@@ -58,6 +58,8 @@ class Abstract_VM_Version: AllStatic { | |||
static uint64_t _features; | |||
static const char* _features_string; | |||
|
|||
static const char* _cpu_info_string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure the reason to introduce _cpu_info_string
.
Seems to me you could just use _features_string, and remove _cpu_info_string and its related code, e.g. extract_features_string
. Please check the code in test/lib/jdk/test/whitebox/cpuinfo/CPUInfo.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mayber in CPUFeatures
, could use the similar code as CPUInfo
to split the cpu string into cpu features?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention is to align _features_string
with _features
which enumerates well-known CPU capabilities JVM manages. As of now, _features_string
contains more information, so I introduced _cpu_info_string
to keep it.
Speaking of test/lib/jdk/test/whitebox/cpuinfo/CPUInfo.java
, the approach chosen there may be fine for a test library, but we need a more stable API between JVM and JDK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this.
But it might be better to change the spliting regex of _features_string
in CPUFeatures.java to support riscv cpu features format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I pushed an update. Let me know what you think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found another possible issue on riscv.
V unaryMathOp(Unary op, int opc, VectorSpecies<E> vspecies, | ||
IntFunction<VectorSupport.UnaryOperation<V,?>> implSupplier, | ||
V v) { | ||
var entry = lookup(op, opc, vspecies, implSupplier); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems there is another issue for riscv here.
If the rvv extension is not supported on the running machine, it will still generate the code using rvv, this should lead to a crash at runtime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In previous code, we use UseRVV
to detect if rvv extension is supported.
On the other hand, user can choose to disable UseRVV if they want even if rvv extension is supported on the running machine. In this sense, there could be similar issue on other platforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the following check catch UseRVV == false
case on RISC-V?
public boolean isSupported(Operator op, VectorSpecies<?> vspecies) {
...
int maxLaneCount = VectorSupport.getMaxLaneCount(vspecies.elementType());
if (vspecies.length() > maxLaneCount) {
return false; // lacking vector support
}
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR both VectorSupport.getMaxLaneCount()
and CPUFeatures
don't rely on raw list of ISA extensions CPU supports, but only those reported by the JVM. So, if some feature support is disabled on JVM side, it won't be reported by VM_Version
and, hence, CPUFeatures
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for updating! Looks good for riscv. I have ran some basic tests for vector API, passed. I did not ran benchmark, as riscv & aarch64 share the same way to bridge from java to sleef.
Does the following check catch UseRVV == false case on RISC-V?
Yes. If you don't mind, an explicit comment might be helpful. As to me "lacking vector support" here means the vector length is not large enough, but it's quite subjective, so you are on the call.
FTR both VectorSupport.getMaxLaneCount() and CPUFeatures don't rely on raw list of ISA extensions CPU supports, but only those reported by the JVM. So, if some feature support is disabled on JVM side, it won't be reported by VM_Version and, hence, CPUFeatures.
I'm fine with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I added some clarifications in the comments.
char* buf = NEW_ARENA_ARRAY(C->comp_arena(), char, buflen); | ||
debug_name = debug_name_oop->const_oop()->as_instance()->java_lang_String_str(buf, buflen); | ||
} | ||
Node* vcall = make_runtime_call(RC_VECTOR, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By generating an upfront CallLeafVectorNode, we may miss out on performing any GVN-style optimization for trigonometric identities like the following. do you think creating a macro node which can lazily be expanded to call node during macro expansion will help.
arcsin(sin(x)) => x
arccos(cos(x)) => x
sin(arcsin(x) => x
cos(arccos(x) => x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does look attractive, but macro expansion-based solution requires JVM to internalize such operations and their properties.
IMO a higher-level solution based on more generic JVM primitives would enable libraries to properly annotate their operations in Java bytecodes/class files, so C2 can perform such type of transformations without the need to intrinsify each individual operation first. (Think of JDK-8218414 / JDK-8347901 on steroids.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this is a typical graph transform which cannot be applied currently because we are generating CallLeafVectorNode upfront during parsing, If we prevent intrinsification then compiler will attempt inlining, generating a much complex graph shape which may not be reducible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any insurmountable problems performing such transformations on chains of CallLeafVector
nodes (or any other call nodes). But the missing piece is information about the algebraic properties of native functions JVM can't derive on its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Best Regards
Thanks for the reviews! /integrate |
Going to push as commit e57fd71.
Your commit was automatically rebased without conflicts. |
Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API.
Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side.
The patch consists of the following parts:
jdk.incubator.vector.CPUFeatures
) used for CPU dispatching.java.lang.foreign
API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code.Once
java.lang.foreign
supports vectors & vector calling conventions, VM intrinsics can go away.Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64).
Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64)
Thanks!
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24462/head:pull/24462
$ git checkout pull/24462
Update a local copy of the PR:
$ git checkout pull/24462
$ git pull https://git.openjdk.org/jdk.git pull/24462/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 24462
View PR using the GUI difftool:
$ git pr show -t 24462
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24462.diff
Using Webrev
Link to Webrev Comment