WIP — This page is actively being updated as we work through Spark installs. Expect changes.
- Docker (pre-installed, v28.x)
- Node.js 22 (installed by the install.sh)
- OpenShell CLI (installed via the Quick Start steps below)
- NVIDIA API Key from build.nvidia.com — prompted on first run
# Install OpenShell:
curl -LsSf https://raw.githubusercontent.com/NVIDIA/OpenShell/main/install.sh | sh
# Clone NemoClaw:
git clone https://github.com/NVIDIA/NemoClaw.git
cd NemoClaw
# Spark-specific setup
sudo ./scripts/setup-spark.sh
# Install NemoClaw using the NemoClaw/install.sh:
./install.sh
# Alternatively, you can use the hosted install script:
curl -fsSL https://www.nvidia.com/nemoclaw.sh | bashDGX Spark ships Ubuntu 24.04 + Docker 28.x but no k8s/k3s. OpenShell embeds k3s inside a Docker container, which hits two problems on Spark:
Error in the hyper legacy client: client error (Connect)
Permission denied (os error 13)
Cause: Your user isn't in the docker group.
Fix: setup-spark runs usermod -aG docker $USER. You may need to log out and back in (or newgrp docker) for it to take effect.
K8s namespace not ready
openat2 /sys/fs/cgroup/kubepods/pids.max: no
Failed to start ContainerManager: failed to initialize top level QOS containers
Cause: Spark runs cgroup v2 (Ubuntu 24.04 default). OpenShell's gateway container starts k3s, which tries to create cgroup v1-style paths that don't exist. The fix is --cgroupns=host on the container, but OpenShell doesn't expose that flag.
Fix: setup-spark sets "default-cgroupns-mode": "host" in /etc/docker/daemon.json and restarts Docker. This makes all containers use the host cgroup namespace, which is what k3s needs.
# Check if you're on cgroup v2
stat -fc %T /sys/fs/cgroup/
# Expected: cgroup2fs
# Add cgroupns=host to Docker daemon config
sudo python3 -c "
import json, os
path = '/etc/docker/daemon.json'
d = json.load(open(path)) if os.path.exists(path) else {}
d['default-cgroupns-mode'] = 'host'
json.dump(d, open(path, 'w'), indent=2)
"
# Restart Docker
sudo systemctl restart dockersudo usermod -aG docker $USER
newgrp docker # or log out and back innemoclaw onboard| Issue | Status | Workaround |
|---|---|---|
| cgroup v2 kills k3s in Docker | Fixed in setup-spark |
daemon.json cgroupns=host |
| Docker permission denied | Fixed in setup-spark |
usermod -aG docker |
| CoreDNS CrashLoop after setup | Fixed in fix-coredns.sh |
Uses container gateway IP, not 127.0.0.11 |
| Image pull failure (k3s can't find built image) | OpenShell bug | openshell gateway destroy && openshell gateway start, re-run setup |
| GPU passthrough | Untested on Spark | Should work with --gpu flag if NVIDIA Container Toolkit is configured |
# Check sandbox is running
openshell sandbox list
# Should show: nemoclaw Ready
# Test the agent
openshell sandbox connect nemoclaw
# Inside sandbox:
nemoclaw-start openclaw agent --agent main --local -m 'hello' --session-id test
# Monitor network egress (separate terminal)
openshell termDGX Spark (Ubuntu 24.04, cgroup v2)
└── Docker (28.x, cgroupns=host)
└── OpenShell gateway container
└── k3s (embedded)
└── nemoclaw sandbox pod
└── OpenClaw agent + NemoClaw plugin