Agent Diagnostic
Investigated by: Claude Code (debug-openshell-cluster skill)
Cluster: openshell · https://127.0.0.1:8080 · v0.0.7
Sandbox: strong-treefrog · image ghcr.io/nvidia/openshell-community/sandboxes/base@sha256:97a6a668...
Cluster is fully healthy — this is not a cluster startup issue.
openshell status → Connected (v0.0.7)
node status → Ready (k3s v1.35.2+k3s1)
openshell-0 → Running, 0 restarts
Root cause: the Claude Code binary aborts on ARM64 kernels with 64K page size.
| Property |
Value |
| Kernel |
6.14.0-1013-nvidia-64k |
| Architecture |
aarch64 |
getconf PAGE_SIZE |
65536 (64K) |
/usr/local/bin/claude |
ELF ARM64 standalone binary, 222MB |
| Bundled runtime |
Bun v1.3.1 (confirmed via grep -boa "Bun v") |
| Crash signal |
SIGABRT (exit 134, not SIGSEGV) |
| Seccomp |
Disabled — not the cause |
| Node.js v22.22.1 |
Works fine in same sandbox |
Bun's JavaScript engine (JavaScriptCore) performs memory layout operations assuming ≤16K page alignment. On a 64K page kernel, these assumptions are violated and JSC calls abort() before any user code executes. node itself is unaffected — the NPM-based install path uses Node.js and avoids this entirely.
Commands run:
openshell status
openshell doctor exec -- kubectl get pods -A -o wide
openshell doctor exec -- kubectl -n openshell exec strong-treefrog -- uname -a
openshell doctor exec -- kubectl -n openshell exec strong-treefrog -- getconf PAGE_SIZE
openshell doctor exec -- kubectl -n openshell exec strong-treefrog -- /usr/local/bin/claude --version
openshell doctor exec -- kubectl -n openshell exec strong-treefrog -- sh -c 'cat /proc/self/status | grep -E "Seccomp|NoNewPrivs|CapEff"'
openshell doctor exec -- kubectl -n openshell exec strong-treefrog -- ldd /usr/local/bin/claude
openshell doctor exec -- kubectl -n openshell exec strong-treefrog -- sh -c 'ls -lh /usr/local/bin/claude'
openshell doctor exec -- kubectl -n openshell exec strong-treefrog -- sh -c 'grep -boa "Bun v" /usr/local/bin/claude'
openshell doctor exec -- kubectl -n openshell exec strong-treefrog -- node -e "process.exit(0)" # ✓ works
Description
The default Claude Code installer produces a binary that does not run correctly on GB200 nodes.
$ openshell sandbox create -- claude
...
Created sandbox: strong-treefrog
$ openshell sandbox connect strong-treefrog
sandbox@strong-treefrog:~$ claude
Aborted (core dumped)
This is a known issue with the Claude Code installer on these machines. The workaround is to install via NPM instead:
npm install -g @anthropic-ai/claude-code
OpenShell should consider using the NPM-based install path for Claude on
affected systems. It is unclear whether this is specific to Grace-based
systems (GB200) or affects ARM64 in general — worth testing on other
aarch64 machines to narrow it down.
Reproduction Steps
- On a GB200 node, create a Claude sandbox:
openshell sandbox create -- claude
- Connect to the sandbox:
openshell sandbox connect <name>
- Run
claude
- Observe:
Aborted (core dumped)
Environment
- Host: GB200 (Grace Blackwell), aarch64
- OpenShell sandbox with Claude provider
Logs
Agent-First Checklist
Agent Diagnostic
Investigated by: Claude Code (debug-openshell-cluster skill)
Cluster:
openshell·https://127.0.0.1:8080· v0.0.7Sandbox:
strong-treefrog· imageghcr.io/nvidia/openshell-community/sandboxes/base@sha256:97a6a668...Cluster is fully healthy — this is not a cluster startup issue.
Root cause: the Claude Code binary aborts on ARM64 kernels with 64K page size.
6.14.0-1013-nvidia-64kaarch64getconf PAGE_SIZE65536(64K)/usr/local/bin/claudegrep -boa "Bun v")SIGABRT(exit 134, not SIGSEGV)Bun's JavaScript engine (JavaScriptCore) performs memory layout operations assuming ≤16K page alignment. On a 64K page kernel, these assumptions are violated and JSC calls
abort()before any user code executes.nodeitself is unaffected — the NPM-based install path uses Node.js and avoids this entirely.Commands run:
Description
The default Claude Code installer produces a binary that does not run correctly on GB200 nodes.
This is a known issue with the Claude Code installer on these machines. The workaround is to install via NPM instead:
OpenShell should consider using the NPM-based install path for Claude on
affected systems. It is unclear whether this is specific to Grace-based
systems (GB200) or affects ARM64 in general — worth testing on other
aarch64 machines to narrow it down.
Reproduction Steps
openshell sandbox create -- claudeopenshell sandbox connect <name>claudeAborted (core dumped)Environment
Logs
Agent-First Checklist
debug-openshell-cluster,debug-inference,openshell-cli)