Skip to content

Use Vector API in the Java Extension #824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

samyron
Copy link
Contributor

@samyron samyron commented Jul 8, 2025

PLEASE DO NOT MERGE

Overview

This PR uses the jdk.incubator.vector module as mentioned in issue #739 to accelerate generating JSON with the same algorithm as the C extension.

The PR as it exists right now, it will attempt to build the json.ext.VectorizedEscapeScanner class with a target release of 16. This is the first version of Java with support for the jdk.incubator.vector module. The remaining code is built for Java 1.8. The code will attempt to load the json.ext.VectorizedEscapeScanner only if the json.enableVectorizedEscapeScanner system property is set to true (or 1).

I'm not entirely sure how this is packaged / included with JRuby so I'd love @byroot and @headius's (and others?) thought about how to potential package and/or structure the JARs. I did consider adding the json.ext.VectorizedEscapeScanner to a separate generator-vectorized.jar but I thought I'd solicit feedback before spending any more time on the build / package process.

Benchmarks

Machine M1 Macbook Air

Note: I've had trouble modifying the compare.rb I was using for the C extension to work reliability with the Java extension. I'll probably spend more time trying to get it to work, but as of right now these are pretty raw benchmarks.

Below are two sample runs of the real-world benchmarks. The benchmarks are much more variable then the C extension for some reason. I'm not sure if HotSpot is doing something slightly different per execution.

Vector API Enabled

scott@Scotts-MacBook-Air json % ONLY=json JAVA_OPTS='--add-modules jdk.incubator.vector -Djson.enableVectorizedEscapeScanner=true' ruby -I"lib" benchmark/encoder-realworld.rb
WARNING: Using incubator modules: jdk.incubator.vector
== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.384k i/100ms
Calculating -------------------------------------
                json     15.289k (± 0.8%) i/s   (65.41 μs/i) -    153.624k in  10.048481s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json    76.000 i/100ms
Calculating -------------------------------------
                json    753.787 (± 3.6%) i/s    (1.33 ms/i) -      7.524k in   9.997059s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json   173.000 i/100ms
Calculating -------------------------------------
                json      1.751k (± 1.1%) i/s  (571.24 μs/i) -     17.646k in  10.081260s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.390k i/100ms
Calculating -------------------------------------
                json     23.829k (± 0.8%) i/s   (41.97 μs/i) -    239.000k in  10.030503s

Vector API Disabled

scott@Scotts-MacBook-Air json % ONLY=json JAVA_OPTS='--add-modules jdk.incubator.vector -Djson.enableVectorizedEscapeScanner=false' ruby -I"lib" benchmark/encoder-realworld.rb
WARNING: Using incubator modules: jdk.incubator.vector
VectorizedEscapeScanner disabled.
== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.204k i/100ms
Calculating -------------------------------------
                json     12.937k (± 1.1%) i/s   (77.30 μs/i) -    130.032k in  10.052234s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json    80.000 i/100ms
Calculating -------------------------------------
                json    817.378 (± 1.0%) i/s    (1.22 ms/i) -      8.240k in  10.082058s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json   147.000 i/100ms
Calculating -------------------------------------
                json      1.499k (± 1.3%) i/s  (667.08 μs/i) -     14.994k in  10.004181s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.269k i/100ms
Calculating -------------------------------------
                json     22.366k (± 5.7%) i/s   (44.71 μs/i) -    224.631k in  10.097069s

master as of commit c5af1b68c582335c2a82bbc4bfa5b3e41ead1eba

scott@Scotts-MacBook-Air json % ONLY=json ruby -I"lib" benchmark/encoder-realworld.rb
== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json   886.000 i/100ms
Calculating -------------------------------------
                json^C%                                                                                                                   
scott@Scotts-MacBook-Air json % ONLY=json ruby -I"lib" benchmark/encoder-realworld.rb
== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.031k i/100ms
Calculating -------------------------------------
                json     10.812k (± 1.3%) i/s   (92.49 μs/i) -    108.255k in  10.014260s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json    82.000 i/100ms
Calculating -------------------------------------
                json    824.921 (± 1.0%) i/s    (1.21 ms/i) -      8.282k in  10.040787s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json   141.000 i/100ms
Calculating -------------------------------------
                json      1.421k (± 0.7%) i/s  (703.85 μs/i) -     14.241k in  10.023979s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.274k i/100ms
Calculating -------------------------------------
                json     22.612k (± 0.9%) i/s   (44.22 μs/i) -    227.400k in  10.057516s

Observations

activitypub.json and twitter.json seem to be consistently faster with the Vector API enabled. citm_catalog.json seems consistently a bit slower and ohai.json is fairly close to even.

@samyron samyron force-pushed the sm/java-vector-simd branch from 194ba01 to 15c7187 Compare July 15, 2025 03:12
@samyron
Copy link
Contributor Author

samyron commented Jul 15, 2025

Using hsdis to examine the generated assembly I can verify that on my Macbook Air the Hotspot C2 Compiler does indeed use Neon instructions.

ONLY=json JAVA_OPTS='--add-modules jdk.incubator.vector -Djson.enableVectorizedEscapeScanner=true -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -XX:+PrintIntrinsics -XX:CompileCommand=print,*VectorizedEscapeScanner.*' ruby -I"lib" benchmark/encoder-realworld.rb > output.txt 2>output.txt
Compiled method (c2)   22086 5801       4       json.ext.VectorizedEscapeScanner::scan (391 bytes)
<snip>

[Disassembly]
--------------------------------------------------------------------------------
[Constant Pool (empty)]

--------------------------------------------------------------------------------

[Entry Point]
  # {method} {0x0000000133c3a0d8} 'scan' '(Ljson/ext/EscapeScanner$State;)Z' in 'json/ext/VectorizedEscapeScanner'
  # this:     c_rarg1:c_rarg1 
                        = 'json/ext/VectorizedEscapeScanner'
  # parm0:    c_rarg2:c_rarg2 
                        = 'json/ext/EscapeScanner$State'
  #           [sp+0x30]  (sp of caller)
  0x000000011b28d0c0:   ldr		w8, [x1, #8]
  0x000000011b28d0c4:   cmp		w9, w8
  0x000000011b28d0c8:   b.eq		#0x11b28d0d0
  0x000000011b28d0cc:   b		#0x11aa5fe80        ;   {runtime_call ic_miss_stub}
[Verified Entry Point]
  0x000000011b28d0d0:   nop		
  0x000000011b28d0d4:   sub		x9, sp, #0x14, lsl #12
  0x000000011b28d0d8:   str		xzr, [x9]
  0x000000011b28d0dc:   sub		sp, sp, #0x30
 <snip>
  0x000000011b28d194:   add		x12, x5, w14, sxtw
  0x000000011b28d198:   ldr		q20, [x12, #0x10]
  0x000000011b28d19c:   eor		v21.16b, v20.16b, v17.16b
  0x000000011b28d1a0:   cmgt		v22.16b, v19.16b, v20.16b
  0x000000011b28d1a4:   cmgt		v21.16b, v18.16b, v21.16b
  0x000000011b28d1a8:   cmeq		v20.16b, v20.16b, v16.16b
  0x000000011b28d1ac:   bic		v21.16b, v21.16b, v22.16b
  0x000000011b28d1b0:   orr		v20.16b, v20.16b, v21.16b
  0x000000011b28d1b4:   str		w1, [x2, #0x30]
  0x000000011b28d1b8:   addv		b21, v20.16b
  0x000000011b28d1bc:   umov		w8, v21.b[0]
  0x000000011b28d1c0:   cmp		w8, wzr
  0x000000011b28d1c4:   b.ne		#0x11b28d40c
  0x000000011b28d1c8:   add		w14, w7, #0x10
  0x000000011b28d1cc:   ldr		x12, [x28, #0x450]
  0x000000011b28d1d0:   str		w14, [x2, #0x14]    ; ImmutableOopMap {c_rarg2=Oop c_rarg5=Oop }
                                                            ;*goto {reexecute=1 rethrow=0 return_oop=0}
                                                            ; - (reexecute) json.ext.VectorizedEscapeScanner::scan@308 (line 59)
<snip>

@headius
Copy link
Contributor

headius commented Jul 16, 2025

@samyron OMG I look away for a few days and you just go and do it! Bravo!

I'll have a look at these changes soon and see if I can offer any suggestions. This API is still a bit of a moving target, but I think we can work around that with a little Ruby magic here and there.

I will also point the Vector API folks at this PR so they can see what we're doing and provide additional input.

Amazing work!

@headius
Copy link
Contributor

headius commented Jul 16, 2025

I've posted a thread to the panama-dev list here: https://mail.openjdk.org/pipermail/panama-dev/2025-July/021080.html

@samyron
Copy link
Contributor Author

samyron commented Jul 28, 2025

I decided to try a different approach after looking at the HotSpot C2 output. Unlike in the C extension, where we mostly control method inlining, HotSpot isn't so easily influenced.

I merged VectorizedStringEncoder which wraps the escape logic in the vectorized scanning. This reduces method calls back to the search code.

Performance of VectorizedStringEncoder

scott@Scotts-MacBook-Air json % ONLY=json JAVA_OPTS='--add-modules jdk.incubator.vector -Djson.enableVectorizedEscapeScanner=false -Djson.enableVectorizedStringEncoder=true' ruby -I"lib" benchmark/encoder-realworld.rb
WARNING: Using incubator modules: jdk.incubator.vector
VectorizedEscapeScanner disabled.
json.ext.VectorizedStringEncoder loaded successfully.
== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.537k i/100ms
Calculating -------------------------------------
                json     15.382k (± 0.6%) i/s   (65.01 μs/i) -    155.237k in  10.092376s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json    81.000 i/100ms
Calculating -------------------------------------
                json    818.347 (± 0.7%) i/s    (1.22 ms/i) -      8.181k in   9.997474s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json   176.000 i/100ms
Calculating -------------------------------------
                json      1.766k (± 1.9%) i/s  (566.28 μs/i) -     17.776k in  10.070684s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.426k i/100ms
Calculating -------------------------------------
                json     23.958k (± 0.6%) i/s   (41.74 μs/i) -    240.174k in  10.025043s

Additionally, here is a screenshot of VisualVM showing the result of running the activitypub.json benchmark for 30 seconds.

image

@headius
Copy link
Contributor

headius commented Jul 28, 2025

@samyron This is interesting progress! I am looking forward to trying it myself now that I'm back in the office.

Yes, HotSpot can be a tricky beast to manipulate. We will want to look at some deeper logging of the JIT and inlining decisions to see whether everything that should be is getting inlined. There's potentially other parts of json unrelated to your changes that are also interfering with inlining (such as the double-dispatching logic to find an appropriate formatter for output text).

Have you tried running on a newer JDK? There's continuous improvements in this area.

@headius
Copy link
Contributor

headius commented Jul 28, 2025

It's also possible that we are losing too much performance to excessive allocation. I'll try to do some profiling once I get your code up and running.

@headius
Copy link
Contributor

headius commented Jul 28, 2025

Oh, BTW, I got one response to my email about your work, pointing me to a Java library that has already been attempting to use the vector API to speed up json processing. It may provide some interesting pointers: https://github.com/simdjson/simdjson-java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants