OpenShell produces two container images, both published for linux/amd64 and linux/arm64.
The gateway runs the control plane API server. It is deployed as a StatefulSet inside the cluster container via a bundled Helm chart.
- Dockerfile:
deploy/docker/Dockerfile.gateway - Registry:
ghcr.io/nvidia/openshell/gateway:latest - Pulled when: Cluster startup (the Helm chart triggers the pull)
- Entrypoint:
openshell-server --port 8080(gRPC + HTTP, mTLS)
The cluster image is a single-container Kubernetes distribution that bundles the Helm charts, Kubernetes manifests, and the openshell-sandbox supervisor binary needed to bootstrap the control plane.
- Dockerfile:
deploy/docker/Dockerfile.cluster - Registry:
ghcr.io/nvidia/openshell/cluster:latest - Pulled when:
openshell gateway start
The supervisor binary (openshell-sandbox) is cross-compiled in a build stage and placed at /opt/openshell/bin/openshell-sandbox. It is exposed to sandbox pods at runtime via a read-only hostPath volume mount — it is not baked into sandbox images.
OpenShell's runtime bundle publication contract is tarball-first. The canonical artifact is a per-architecture release tarball whose single top-level bundle directory contains the install-root payload plus manifest.json. If OCI publication is added later, it is only a mirror transport for that same bundle contract.
The current cluster build now consumes that published tarball through the local staged bundle path. tasks/scripts/docker-build-cluster.sh requires OPENSHELL_RUNTIME_BUNDLE_TARBALL, fails before any Helm packaging or Docker build when the bundle is missing or invalid, and stages the verified install-root payload under deploy/docker/.build/runtime-bundle/<arch>/. deploy/docker/Dockerfile.cluster then copies the runtime binaries, config, and shared libraries from that staged local tree into the final cluster image.
That requirement now flows through all cluster-image entrypoints instead of only the direct script call:
- local bootstrap via
tasks/scripts/cluster-bootstrap.shrequiresOPENSHELL_RUNTIME_BUNDLE_TARBALLwhenever it is going to build the cluster image; prebuilt-image flows can still setSKIP_CLUSTER_IMAGE_BUILD=1 - remote gateway deploy via
scripts/remote-deploy.shrequires either--runtime-bundle-tarball(or localOPENSHELL_RUNTIME_BUNDLE_TARBALL) for sync-and-build flows, or--remote-runtime-bundle-tarballwhen--skip-syncshould reuse a tarball already staged on the remote host; the script exports the resolved remote path before invoking the remote cluster build - multi-arch publishing via
tasks/scripts/docker-publish-multiarch.shrequiresOPENSHELL_RUNTIME_BUNDLE_TARBALL_AMD64andOPENSHELL_RUNTIME_BUNDLE_TARBALL_ARM64, builds one verified per-arch cluster image at a time, then assembles the final multi-arch manifest from those architecture-specific tags - GitHub workflow cluster builds now consume release-asset URLs rather than local tarball paths directly:
tasks/scripts/download-runtime-bundle.shdownloads per-arch tarballs intodeploy/docker/.build/runtime-bundles/,tasks/scripts/ci-build-cluster-image.shmaps single-arch builds todocker:build:clusterand multi-arch builds todocker:build:cluster:multiarch, and.github/workflows/docker-build.ymlpasses explicit bundle URLs from workflow inputs or repo variables into that helper path
The intended first OpenShell tarball consumption path is the tasks/scripts/docker-build-cluster.sh -> deploy/docker/Dockerfile.cluster flow:
tasks/scripts/docker-build-cluster.shreceives the per-architecture runtime bundle tarball path throughOPENSHELL_RUNTIME_BUNDLE_TARBALLbeforedocker buildx build.- The script verifies the single top-level bundle-directory shape, requires valid JSON
manifest.jsoncontent inside that bundle directory with a matchingarchitecture, validates manifest-declared checksums and sizes, and checks the required runtime payload paths before staging. - The script stages the tarball payload into
deploy/docker/.build/runtime-bundle/<arch>/, preserving the bundle directory and install-root layout expected by OpenShell. deploy/docker/Dockerfile.clusterloads the staged local bundle tree in a dedicated build stage and copies the verified runtime files into the same final image paths OpenShell already expects.
The tarball payload must contain the exact runtime assets the cluster image expects today:
/usr/bin/nvidia-cdi-hook/usr/bin/nvidia-container-runtime/usr/bin/nvidia-container-runtime-hook/usr/bin/nvidia-container-cli/usr/bin/nvidia-ctk/etc/nvidia-container-runtime//usr/lib/*-linux-gnu/libnvidia-container*.so*
This handoff keeps the OpenShell build package-manager-free for the runtime dependency itself. Standard OS image layers can remain upstream inputs, but the GPU runtime contents enter the build as a verified tarball payload rather than through a distro package repository. OCI, if later added, mirrors this same tarball-defined payload instead of changing the OpenShell consumption contract.
Sandbox images are not built in this repository. They are maintained in the openshell-community repository and pulled from ghcr.io/nvidia/openshell-community/sandboxes/ at runtime.
The default sandbox image is ghcr.io/nvidia/openshell-community/sandboxes/base:latest. To use a named community sandbox:
openshell sandbox create --from <name>This pulls ghcr.io/nvidia/openshell-community/sandboxes/<name>:latest.
mise run cluster is the primary development command. It bootstraps a cluster if one doesn't exist, then performs incremental deploys for subsequent runs.
The incremental deploy (cluster-deploy-fast.sh) fingerprints local Git changes and only rebuilds components whose files have changed:
| Changed files | Rebuild triggered |
|---|---|
| Cargo manifests, proto definitions, cross-build script | Gateway + supervisor |
crates/openshell-server/*, Dockerfile.gateway |
Gateway |
crates/openshell-sandbox/*, crates/openshell-policy/* |
Supervisor |
deploy/helm/openshell/* |
Helm upgrade |
When no local changes are detected, the command is a no-op.
Gateway updates are pushed to a local registry and the StatefulSet is restarted. Supervisor updates are copied directly into the running cluster container via docker cp — new sandbox pods pick up the updated binary immediately through the hostPath mount, with no image rebuild or cluster restart required.
Fingerprints are stored in .cache/cluster-deploy-fast.state. You can also target specific components explicitly:
mise run cluster -- gateway # rebuild gateway only
mise run cluster -- supervisor # rebuild supervisor only
mise run cluster -- chart # helm upgrade only
mise run cluster -- all # rebuild everything