Skip to content

Commit e012e09

Browse files
committed
fix(vm): fix gateway readiness timeout with correct port mapping and aligned pre-bake
Two issues caused the gateway service readiness check to time out: 1. Port mapping mismatch: gvproxy mapped host:30051 → VM:8080, but with bridge CNI the pod listens on 8080 inside its network namespace, not on the VM's root namespace. Changed to 30051:30051 so traffic flows through the NodePort service (kube-proxy nftables → pod:8080). 2. Pod cycling from helm upgrade: build-rootfs.sh pre-baked with hostNetwork=true and automountServiceAccountToken=false, but gateway-init.sh changed these at boot, triggering a HelmChart reconcile that killed the pre-baked pod ~90s in. Aligned pre-bake values (hostNetwork=false, automountServiceAccountToken=true) to match runtime, eliminating the manifest delta.
1 parent 070bcca commit e012e09

File tree

2 files changed

+17
-17
lines changed

2 files changed

+17
-17
lines changed

crates/openshell-vm/scripts/build-rootfs.sh

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -243,17 +243,17 @@ if [ -f "$HELMCHART" ]; then
243243
sed -i '' "s|server:[[:space:]]*sandboxImage: ghcr.io/nvidia/openshell-community/sandboxes/base:latest|server:\n sandboxImage: ${SANDBOX_IMAGE}|g" "$HELMCHART" 2>/dev/null || true
244244
sed -i '' "s|sandboxImage: ghcr.io/nvidia/openshell-community/sandboxes/base:latest|sandboxImage: ${SANDBOX_IMAGE}|g" "$HELMCHART" 2>/dev/null \
245245
|| sed -i "s|sandboxImage: ghcr.io/nvidia/openshell-community/sandboxes/base:latest|sandboxImage: ${SANDBOX_IMAGE}|g" "$HELMCHART"
246-
# Enable hostNetwork for VM (no kube-proxy / iptables).
247-
sed -i '' 's|__HOST_NETWORK__|true|g' "$HELMCHART" 2>/dev/null \
248-
|| sed -i 's|__HOST_NETWORK__|true|g' "$HELMCHART"
249-
# Disable SA token automount. The projected volume at
250-
# /var/run/secrets/kubernetes.io/serviceaccount fails on sandbox
251-
# re-creation because /var/run is a symlink to /run in the container
252-
# image and the native snapshotter + virtiofs combination can't
253-
# resolve it correctly on the second mount.
254-
sed -i '' 's|__AUTOMOUNT_SA_TOKEN__|false|g' "$HELMCHART" 2>/dev/null \
255-
|| sed -i 's|__AUTOMOUNT_SA_TOKEN__|false|g' "$HELMCHART"
256-
# Mount the k3s kubeconfig into the pod since SA token isn't mounted.
246+
# Bridge CNI: pods use normal pod networking, not hostNetwork.
247+
# This must match what gateway-init.sh applies at runtime so the
248+
# HelmChart manifest is unchanged at boot — preventing a helm
249+
# upgrade job that would cycle the pre-baked pod.
250+
sed -i '' 's|__HOST_NETWORK__|false|g' "$HELMCHART" 2>/dev/null \
251+
|| sed -i 's|__HOST_NETWORK__|false|g' "$HELMCHART"
252+
# Enable SA token automount for bridge CNI mode. Must match
253+
# gateway-init.sh runtime value to avoid manifest delta.
254+
sed -i '' 's|__AUTOMOUNT_SA_TOKEN__|true|g' "$HELMCHART" 2>/dev/null \
255+
|| sed -i 's|__AUTOMOUNT_SA_TOKEN__|true|g' "$HELMCHART"
256+
# Mount the k3s kubeconfig into the pod for VM mode.
257257
sed -i '' 's|__KUBECONFIG_HOST_PATH__|"/etc/rancher/k3s"|g' "$HELMCHART" 2>/dev/null \
258258
|| sed -i 's|__KUBECONFIG_HOST_PATH__|"/etc/rancher/k3s"|g' "$HELMCHART"
259259
# Disable persistence — use /tmp for the SQLite database. PVC mounts

crates/openshell-vm/src/lib.rs

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -161,12 +161,12 @@ impl VmConfig {
161161
],
162162
workdir: "/".to_string(),
163163
port_map: vec![
164-
// Navigator server — with hostNetwork the server binds
165-
// directly to port 8080 on the VM's interface, bypassing
166-
// NodePort (which requires kube-proxy / iptables).
167-
// Map host 30051 -> guest 8080 so the external-facing
168-
// port stays the same for CLI clients.
169-
"30051:8080".to_string(),
164+
// Navigator server — with bridge CNI the pod listens on
165+
// 8080 inside its own network namespace (10.42.0.x), not
166+
// on the VM's root namespace. The NodePort service
167+
// (kube-proxy nftables) forwards VM:30051 → pod:8080.
168+
// gvproxy maps host:30051 → VM:30051 to complete the path.
169+
"30051:30051".to_string(),
170170
],
171171
vsock_ports: vec![],
172172
log_level: 3, // Info — for debugging

0 commit comments

Comments
 (0)