Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JVM segfault when using openjdk@11 11.0.26 on Mac OS 14 #206097

Closed
4 tasks done
dasm-tmlt opened this issue Jan 31, 2025 · 4 comments
Closed
4 tasks done

JVM segfault when using openjdk@11 11.0.26 on Mac OS 14 #206097

dasm-tmlt opened this issue Jan 31, 2025 · 4 comments
Labels
bug Reproducible Homebrew/homebrew-core bug

Comments

@dasm-tmlt
Copy link

brew gist-logs <formula> link OR brew config AND brew doctor output

~ % brew config
HOMEBREW_VERSION: 4.4.19
ORIGIN: https://github.com/Homebrew/brew
HEAD: 23facb219df8615680636ef12103bb3313f44e2a
Last commit: 3 days ago
Branch: stable
Core tap JSON: 31 Jan 02:40 UTC
Core cask tap JSON: 31 Jan 02:40 UTC
HOMEBREW_PREFIX: /opt/homebrew
HOMEBREW_CASK_OPTS: []
HOMEBREW_MAKE_JOBS: 10
HOMEBREW_NO_ANALYTICS: set
Homebrew Ruby: 3.3.7 => /opt/homebrew/Library/Homebrew/vendor/portable-ruby/3.3.7/bin/ruby
CPU: deca-core 64-bit arm_firestorm_icestorm
Clang: 16.0.0 build 1600
Git: 2.39.5 => /Library/Developer/CommandLineTools/usr/bin/git
Curl: 8.7.1 => /usr/bin/curl
macOS: 14.7.3-arm64
CLT: 16.2.0.0.1.1733547573
Xcode: N/A
Rosetta 2: false
~ % brew doctor
Your system is ready to brew.

Verification

  • My brew doctor output says Your system is ready to brew. and am still able to reproduce my issue.
  • I ran brew update and am still able to reproduce my issue.
  • I have resolved all warnings from brew doctor and that did not fix my problem.
  • I searched for recent similar issues at https://github.com/Homebrew/homebrew-core/issues?q=is%3Aissue and found no duplicates.

What were you trying to do (and why)?

I work on a couple libraries that make heavy use of pyspark. On the 23rd, our Mac OS 14 CI jobs started failing due to a segfault in the JVM. After some detective work, I've determined that the first failure happened on the first day brew install openjdk@11 pulled in version 11.0.26. The CI jobs have been failing consistently since then, and I've been able to reproduce the crash on my dev laptop with a very simple pyspark script (attached). Downgrading the JDK to the previous homebrew version (which took some effort) fixes the problem, as does switching to the Microsoft build of OpenJDK. Our Mac OS 13 CI jobs are fine.

Repro script:

import pandas as pd
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

pdf = pd.DataFrame({"A": [1, 1, 3], "B": [2, 2, 4]})
spark_df = spark.createDataFrame(pdf)
# The crash is common, but seemingly not deterministic
for i in range(1000):
    spark_df.count()

One observation: OpenJDK lists their latest version 11 stable release number as 11.0.26+4, which matches the Microsoft build. Homebrew's build is 11.0.26+0.

What happened (include all command output)?

% python crash_test.py
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
25/01/30 18:42:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
#                                                                               
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000000106245150, pid=29597, tid=69123
#
# JRE version: OpenJDK Runtime Environment Homebrew (11.0.26) (build 11.0.26+0)
# Java VM: OpenJDK 64-Bit Server VM Homebrew (11.0.26+0, mixed mode, tiered, compressed oops, g1 gc, bsd-aarch64)
# Problematic frame:
# V  [libjvm.dylib+0x695150]  ObjectSynchronizer::inflate(Thread*, oopDesc*, ObjectSynchronizer::InflateCause)+0x18c
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/dasm/core/hs_err_pid29597.log
[thread 83203 also had an error]
[thread 68611 also had an error]
[thread 80899 also had an error]
[thread 84227 also had an error]
Compiled method (c1)    4883 2223       3       scala.util.hashing.MurmurHash3::productHash (70 bytes)
 total in heap  [0x000000010f95e110,0x000000010f95f408] = 4856
 relocation     [0x000000010f95e280,0x000000010f95e3d8] = 344
 main code      [0x000000010f95e400,0x000000010f95eec0] = 2752
 stub code      [0x000000010f95eec0,0x000000010f95f0a0] = 480
 oops           [0x000000010f95f0a0,0x000000010f95f0a8] = 8
 metadata       [0x000000010f95f0a8,0x000000010f95f148] = 160
 scopes data    [0x000000010f95f148,0x000000010f95f280] = 312
 scopes pcs     [0x000000010f95f280,0x000000010f95f3e0] = 352
 dependencies   [0x000000010f95f3e0,0x000000010f95f3e8] = 8
 nul chk table  [0x000000010f95f3e8,0x000000010f95f408] = 32
#
# If you would like to submit a bug report, please visit:
#   https://github.com/Homebrew/homebrew-core/issues
#
----------------------------------------
Exception occurred during processing of request from ('127.0.0.1', 49804)
Traceback (most recent call last):
  File "/Users/dasm/.pyenv/versions/3.9.19/lib/python3.9/socketserver.py", line 316, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/Users/dasm/.pyenv/versions/3.9.19/lib/python3.9/socketserver.py", line 347, in process_request
    self.finish_request(request, client_address)
  File "/Users/dasm/.pyenv/versions/3.9.19/lib/python3.9/socketserver.py", line 360, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/Users/dasm/.pyenv/versions/3.9.19/lib/python3.9/socketserver.py", line 747, in __init__
    self.handle()
  File "/Users/dasm/core/.venv/lib/python3.9/site-packages/pyspark/accumulators.py", line 295, in handle
    poll(accum_updates)
  File "/Users/dasm/core/.venv/lib/python3.9/site-packages/pyspark/accumulators.py", line 267, in poll
    if self.rfile in r and func():
  File "/Users/dasm/core/.venv/lib/python3.9/site-packages/pyspark/accumulators.py", line 271, in accum_updates
    num_updates = read_int(self.rfile)
  File "/Users/dasm/core/.venv/lib/python3.9/site-packages/pyspark/serializers.py", line 596, in read_int
    raise EOFError
EOFError
----------------------------------------
ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/Users/dasm/core/.venv/lib/python3.9/site-packages/py4j/clientserver.py", line 516, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/dasm/core/.venv/lib/python3.9/site-packages/py4j/java_gateway.py", line 1038, in send_command
    response = connection.send_command(command)
  File "/Users/dasm/core/.venv/lib/python3.9/site-packages/py4j/clientserver.py", line 539, in send_command
    raise Py4JNetworkError(
py4j.protocol.Py4JNetworkError: Error while sending or receiving
Traceback (most recent call last):
  File "/Users/dasm/core/crash_test.py", line 9, in <module>
    spark_df.count()
  File "/Users/dasm/core/.venv/lib/python3.9/site-packages/pyspark/sql/dataframe.py", line 1240, in count
    return int(self._jdf.count())
  File "/Users/dasm/core/.venv/lib/python3.9/site-packages/py4j/java_gateway.py", line 1322, in __call__
    return_value = get_return_value(
  File "/Users/dasm/core/.venv/lib/python3.9/site-packages/pyspark/errors/exceptions/captured.py", line 179, in deco
    return f(*a, **kw)
  File "/Users/dasm/core/.venv/lib/python3.9/site-packages/py4j/protocol.py", line 334, in get_return_value
    raise Py4JError(
py4j.protocol.Py4JError: An error occurred while calling o45.count

What did you expect to happen?

The repro script should finish without crashing.

Step-by-step reproduction instructions (by running brew commands)

1. Use a mac running OS 14 (currently using 14.7.3, but I've seen it on 14.6 as well)(we're seeing the crash on apple silicon macs both running natively and running using rosetta).
2. `brew install openjdk@11`
After the install, `java -version` should give:

openjdk version "11.0.26" 2025-01-21
OpenJDK Runtime Environment Homebrew (build 11.0.26+0)
OpenJDK 64-Bit Server VM Homebrew (build 11.0.26+0, mixed mode)

3. Make sure python is installed. I've seen the crash on both 3.12 and 3.9, so the python version does not appear to be load bearing.
4. `pip install pyspark[sql]==3.5.4`
5. `pip install pandas==2.2.3`
6. Copy the above repro script into a file named `crash_test.py`.
7. `python crash_test.py`
@dasm-tmlt dasm-tmlt added the bug Reproducible Homebrew/homebrew-core bug label Jan 31, 2025
@cho-m
Copy link
Member

cho-m commented Jan 31, 2025

Can try rebuilding bottle with a different workaround for Xcode 16 in #206139.

If it keeps failing after that then will probably need deeper dive into crash report (and maybe someone on Java side to help debug further).


EDIT:

OpenJDK lists their latest version 11 stable release number as 11.0.26+4, which matches the Microsoft build. Homebrew's build is 11.0.26+0.

This doesn't matter as only cosmetic difference. Homebrew just doesn't embed the build number into binaries (via --with-version-build).

The source code is the same, i.e. we use official jdk-11.0.26-ga tag which is just 11.0.26+4

@TronPaul
Copy link

TronPaul commented Jan 31, 2025

The update in #206139 (pulled via brew upgrade) seems to have fixed the SIGSEGV I was seeing yesterday. As some additional info I was also seeing a SIGSEGV when editing the formula to use the 11.0.25+ga version by changing the url and sha256 back to the 11.0.25 values and then reinstalling from source.

@cho-m
Copy link
Member

cho-m commented Feb 1, 2025

Based on above data point and my own observation (I tried re-running reproduction multiple times), will close as resolved by new bottle.

Feel free to comment or open a new issue if still occurring after brew update && brew reinstall openjdk@11

@cho-m cho-m closed this as completed Feb 1, 2025
@dasm-tmlt
Copy link
Author

I'm seeing it fixed as well. Thanks for the speedy resolution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Reproducible Homebrew/homebrew-core bug
Projects
None yet
Development

No branches or pull requests

3 participants