-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Description
I'm trying to build JAX v0.6.2 with ROCm 6.4.1
build command:
python build/build.py build --wheels=jaxlib,rocm-plugin,rocm-pjrt \
--clang_path="${ROCM_PATH}/lib/llvm/bin/clang" \
--target_cpu_features=release \
--rocm_path="${ROCM_PATH}"this results in the following error:
[...]
INFO: Analyzed target //jaxlib/tools:build_wheel (296 packages loaded, 40933 targets configured).
ERROR: /home/user/.cache/bazel/_bazel_user/8eb645d2a1da7caf1a5eb11d9777bbb1/external/gloo/BUILD.bazel:40:11: Compiling gloo/types.cc failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing CppCompile command (from target @@gloo//:gloo)
(cd /home/user/.cache/bazel/_bazel_user/8eb645d2a1da7caf1a5eb11d9777bbb1/execroot/__main__ && \
exec env - \
CLANG_COMPILER_PATH=/opt/rocm/lib/llvm/bin/clang \
PATH=/home/user/.local/bin:/home/user/bin:/home/user/.cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/home/user/.local/share/flatpak/exports/bin:/var/lib/flatpak/exports/bin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/opt/rocm/bin:/usr/lib/rustup/bin \
PWD=/proc/self/cwd \
ROCM_PATH=/opt/rocm \
TF_HIPCC_CLANG=1 \
TF_ROCM_AMDGPU_TARGETS=gfx900,gfx906,gfx908,gfx90a,gfx942,gfx1030,gfx1100,gfx1101,gfx1200,gfx1201 \
TF_ROCM_CLANG=1 \
external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++17' -MD -MF bazel-out/k8-opt/bin/external/gloo/_objs/gloo/types.pic.d '-frandom-seed=bazel-out/k8-opt/bin/external/gloo/_objs/gloo/types.pic.o' -fPIC -iquote external/gloo -iquote bazel-out/k8-opt/bin/external/gloo -isystem external/gloo -isystem bazel-out/k8-opt/bin/external/gloo '-fvisibility=hidden' -Wno-sign-compare -Wno-unknown-warning-option -Wno-stringop-truncation -Wno-array-parameter '-DMLIR_PYTHON_PACKAGE_PREFIX=jaxlib.mlir.' '-DNB_DOMAIN=jax' -Wno-gnu-offsetof-extensions -Qunused-arguments '-Werror=mismatched-tags' '-Wno-error=c23-extensions' -mavx -Wno-gnu-offsetof-extensions -Qunused-arguments -Wl,--enable-new-dtags '--rocm-path=/opt/rocm' -frtlib-add-rpath -Wno-gnu-offsetof-extensions -Qunused-arguments '-Werror=mismatched-tags' '-Wno-error=c23-extensions' -mavx -Wno-gnu-offsetof-extensions -Qunused-arguments -Wl,--enable-new-dtags '--rocm-path=/opt/rocm' -frtlib-add-rpath '-std=c++17' -fexceptions -Wno-unused-variable -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' '-DTENSORFLOW_USE_ROCM=1' -D__HIP_PLATFORM_AMD__ -DEIGEN_USE_HIP -DUSE_ROCM -no-canonical-prefixes -c external/gloo/gloo/types.cc -o bazel-out/k8-opt/bin/external/gloo/_objs/gloo/types.pic.o)
# Configuration: 33fc69229a7839c643a01674a3dee801a1c20e2d9ae5ad4742563610c2e95572
# Execution platform: @@local_execution_config_platform//:platform
In file included from external/gloo/gloo/types.cc:9:
external/gloo/gloo/types.h:66:11: error: unknown type name 'uint8_t'
66 | constexpr uint8_t kGatherSlotPrefix = 0x01;
| ^
external/gloo/gloo/types.h:67:11: error: unknown type name 'uint8_t'
67 | constexpr uint8_t kAllgatherSlotPrefix = 0x02;
| ^
external/gloo/gloo/types.h:68:11: error: unknown type name 'uint8_t'
68 | constexpr uint8_t kReduceSlotPrefix = 0x03;
| ^
external/gloo/gloo/types.h:69:11: error: unknown type name 'uint8_t'
69 | constexpr uint8_t kAllreduceSlotPrefix = 0x04;
| ^
external/gloo/gloo/types.h:70:11: error: unknown type name 'uint8_t'
70 | constexpr uint8_t kScatterSlotPrefix = 0x05;
| ^
external/gloo/gloo/types.h:71:11: error: unknown type name 'uint8_t'
71 | constexpr uint8_t kBroadcastSlotPrefix = 0x06;
| ^
external/gloo/gloo/types.h:72:11: error: unknown type name 'uint8_t'
72 | constexpr uint8_t kBarrierSlotPrefix = 0x07;
| ^
external/gloo/gloo/types.h:73:11: error: unknown type name 'uint8_t'
73 | constexpr uint8_t kAlltoallSlotPrefix = 0x08;
| ^
external/gloo/gloo/types.h:77:21: error: unknown type name 'uint8_t'
77 | static Slot build(uint8_t prefix, uint32_t tag);
| ^
external/gloo/gloo/types.h:77:37: error: unknown type name 'uint32_t'
77 | static Slot build(uint8_t prefix, uint32_t tag);
| ^
external/gloo/gloo/types.h:79:12: error: unknown type name 'uint64_t'
79 | operator uint64_t() const {
| ^
external/gloo/gloo/types.h:86:17: error: unknown type name 'uint64_t'
86 | explicit Slot(uint64_t base, uint64_t delta) : base_(base), delta_(delta) {}
| ^
external/gloo/gloo/types.h:86:32: error: unknown type name 'uint64_t'
86 | explicit Slot(uint64_t base, uint64_t delta) : base_(base), delta_(delta) {}
| ^
external/gloo/gloo/types.h:88:9: error: unknown type name 'uint64_t'
88 | const uint64_t base_;
| ^
external/gloo/gloo/types.h:89:9: error: unknown type name 'uint64_t'
89 | const uint64_t delta_;
| ^
external/gloo/gloo/types.h:97:3: error: unknown type name 'uint16_t'
97 | uint16_t x;
| ^
external/gloo/gloo/types.cc:16:18: error: unknown type name 'uint8_t'
16 | Slot Slot::build(uint8_t prefix, uint32_t tag) {
| ^
external/gloo/gloo/types.cc:16:34: error: unknown type name 'uint32_t'
16 | Slot Slot::build(uint8_t prefix, uint32_t tag) {
| ^
external/gloo/gloo/types.cc:17:3: error: unknown type name 'uint64_t'
17 | uint64_t u64prefix = ((uint64_t)prefix) << 56;
| ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
Target //jaxlib/tools:build_wheel failed to build
INFO: Elapsed time: 23.931s, Critical Path: 10.31s
INFO: 163 processes: 78 internal, 85 local.
ERROR: Build did NOT complete successfully
ERROR: Build failed. Not running target
Traceback (most recent call last):
File "/home/user/src/PKBUILDS/python-jax-rocm-aur/src/jax-rocm-jax-v0.6.0/build/build.py", line 778, in <module>
asyncio.run(main())
~~~~~~~~~~~^^^^^^^^
File "/usr/lib/python3.13/asyncio/runners.py", line 195, in run
return runner.run(main)
~~~~~~~~~~^^^^^^
File "/usr/lib/python3.13/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
File "/usr/lib/python3.13/asyncio/base_events.py", line 725, in run_until_complete
return future.result()
~~~~~~~~~~~~~^^
File "/home/user/src/PKBUILDS/python-jax-rocm-aur/src/jax-rocm-jax-v0.6.0/build/build.py", line 723, in main
raise RuntimeError(f"Command failed with return code {result.return_code}")
RuntimeError: Command failed with return code 1
==> ERROR: A failure occurred in build().
Aborting...
The issue seems to be a missing import in gloo, I opened a PR upstream: pytorch/gloo#452
System info (python version, jaxlib version, accelerator, etc.)
ERROR:2025-06-27 10:30:10,358:jax._src.xla_bridge:647: Jax plugin configuration error: Exception when calling jax_plugins.xla_rocm60.initialize()
Traceback (most recent call last):
File "/home/sepp/src/PKBUILDS/python-jax-rocm-aur/venv-jax/lib/python3.12/site-packages/jax/_src/xla_bridge.py", line 645, in discover_pjrt_plugins
plugin_module.initialize()
File "/home/sepp/src/PKBUILDS/python-jax-rocm-aur/venv-jax/lib/python3.12/site-packages/jax_plugins/xla_rocm60/__init__.py", line 137, in initialize
c_api = xb.register_plugin(
^^^^^^^^^^^^^^^^^^^
File "/home/sepp/src/PKBUILDS/python-jax-rocm-aur/venv-jax/lib/python3.12/site-packages/jax/_src/xla_bridge.py", line 744, in register_plugin
c_api = xla_client.load_pjrt_plugin_dynamically(plugin_name, library_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sepp/src/PKBUILDS/python-jax-rocm-aur/venv-jax/lib/python3.12/site-packages/jaxlib/xla_client.py", line 165, in load_pjrt_plugin_dynamically
return _xla.load_pjrt_plugin(plugin_name, library_path, c_api=None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to open /home/sepp/src/PKBUILDS/python-jax-rocm-aur/venv-jax/lib/python3.12/site-packages/jax_plugins/xla_rocm60/xla_rocm_plugin.so: libamd_comgr.so.2: cannot open shared object file: No such file or directory
jax: 0.5.0
jaxlib: 0.5.0
numpy: 2.3.1
python: 3.12.11 (main, Jun 24 2025, 14:35:57) [GCC 15.1.1 20250425]
device info: cpu-1, 1 local devices"
process_count: 1
platform: uname_result(system='Linux', node='dosa', release='6.15.3-arch1-1', version='#1 SMP PREEMPT_DYNAMIC Thu, 19 Jun 2025 14:41:19 +0000', machine='x86_64')
This points at another issue with the JAX packages from pypi (installed via pip install jax[rocm]): ROCm 6.4.1 ships with libamd_comgr.so.3
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working