Skip to content

WASM shell commands spawn a new Node.js process + JIT compile per command #1440

@atsushi-ishibashi

Description

@atsushi-ishibashi

Summary

Every WASM shell command (ls, cat, pwd, etc.) spawns a new Node.js child process and runs WebAssembly.compile() on the full module binary from scratch. On environments with low single-thread performance or high I/O latency (e.g., ECS Fargate with EFS mounts), this per-command overhead reaches ~5 seconds — causing a 15-command agent session to hit the 120-second ACP timeout.

The existing openShell() API already provides a persistent shell mechanism, but exec() does not use it. Routing exec through a persistent shell would reduce overhead to a single spawn at session start.

Per-command execution flow

Each call to exec("ls /workspace") follows this path:

  1. NativeKernelProxy.exec()resolveExecCommand() wraps as sh -c 'ls /workspace' (native-kernel-proxy.ts:357-406)
  2. NativeKernelProxy.spawn() → sidecar RPC (native-kernel-proxy.ts:408)
  3. WasmExecutionEngine::start_execution() (wasm.rs:355-450)
  4. create_node_child()Command::new(node_binary()).spawn()new OS process (wasm.rs:486-524)
  5. fs.readFile(modulePath) — reads multi-MB WASM binary from disk (node_import_cache.rs:7886)
  6. WebAssembly.compile(moduleBytes)full JIT compile (node_import_cache.rs:7889)
  7. WASI init → execute → process exits — compiled module discarded

Steps 4–7 repeat identically for every ls, cat, pwd. There is no process or module reuse.

Why existing caches don't help

Mechanism What it caches Helps with WASM compile? Helps with process spawn?
Prewarm (wasm.rs:527) Marker file to skip re-prewarm No — compiled module discarded on process.exit(0) (node_import_cache.rs:7891) No
Node.js compile cache (runtime_support.rs:33) JS source → V8 bytecode No — JS only, not WASM No
Import cache (NodeImportCache) Runner/loader files on disk No No

Impact in resource-constrained environments

On macOS (Apple Silicon), the per-command overhead is ~19ms — invisible in practice. But on environments with weaker single-thread CPU or network-attached storage, the fixed costs scale dramatically.

Example: ECS Fargate with EFS workspace (from production logs):

Operation Time
ls /workspace ~5.3s
mkdir -p /workspace ~5.5s
pwd ~4.3s
15 commands total ~75s → ACP timeout at 120s

Contributing factors on such environments:

  • WASM binary read from network filesystem: ~200-500ms (vs ~3ms local SSD)
  • WebAssembly.compile() on lower-clock vCPU: ~500-2000ms (vs ~5-10ms Apple Silicon)
  • Node.js process spawn overhead: ~100-500ms (vs ~5ms)

Proposed fix: route exec through a persistent shell

openShell() (native-kernel-proxy.ts:499-544) already spawns a persistent sh process with stdin write and stdout callbacks. AgentOs exposes it publicly (agent-os.ts:1814-1849). The coreutils WASM binary is a multicall (BusyBox-style) binary, so ls, cat, etc. execute as builtins within the persistent sh without additional WASM spawns.

Current:  [Node spawn → WASM read → JIT compile → WASI init → sh -c "ls"] × N
Proposed: [Node spawn → WASM read → JIT compile → WASI init → sh]          × 1
          [stdin "ls\n" → stdout]                                           × N

Implementation options (broadest-impact first):

  1. NativeKernelProxy.exec() (native-kernel-proxy.ts:357) — intercept sh -c commands and route through a lazily-initialized persistent shell
  2. child_process polyfill bridge (node_import_cache.rs:3693) — benefits guest code calling child_process.spawn("sh", ...) (agent SDKs like Claude CLI use this path)
  3. AgentOs.exec() (agent-os.ts) — simplest, benefits direct callers only

Design considerations:

  • Output boundary detection: delimiter pattern (e.g., printf "___DELIM_%d___\n" "$?") to mark command end and capture exit code
  • Fallback: commands needing streaming onStdout/onStderr callbacks bypass the persistent shell
  • Initialization sync: echo a known token after spawn, wait for it before accepting commands

Verification

Note: The benchmarks below were run on a local macOS (Apple Silicon) machine — not on Fargate or similar resource-constrained environments. To validate the persistent shell approach without access to such environments, the simulated-latency benchmark injects an artificial delay into kernel.exec() to approximate the per-spawn overhead observed in production logs. The absolute numbers differ from real Fargate, but the relative improvement (spawn bypass) accurately reflects the mechanism.

Benchmark: baseline overhead per layer

The following script measures per-layer overhead of WASM command execution. Install @rivet-dev/agent-os-core and @rivet-dev/agent-os-common, then run with node --import tsx bench-wasm-exec.ts.

bench-wasm-exec.ts
import { AgentOs } from "@rivet-dev/agent-os-core";
import type { SoftwareInput } from "@rivet-dev/agent-os-core";
import common from "@rivet-dev/agent-os-common";

const software: SoftwareInput[] = [common];

interface BenchResult {
  label: string;
  durationMs: number;
  stdout?: string;
  stderr?: string;
}

async function bench(
  label: string,
  fn: () => Promise<{ stdout?: string; stderr?: string } | void>,
): Promise<BenchResult> {
  const start = performance.now();
  const result = await fn();
  const durationMs = performance.now() - start;
  return { label, durationMs, stdout: result?.stdout, stderr: result?.stderr };
}

function printResult(r: BenchResult) {
  const ms = r.durationMs.toFixed(1);
  console.log(`  ${r.label.padEnd(45)} ${ms.padStart(8)} ms`);
}

async function main() {
  console.log("=== WASM command execution benchmark ===\n");
  console.log(`Platform: ${process.platform} ${process.arch}`);
  console.log(`Node.js:  ${process.version}\n`);

  // Phase 1: VM creation
  console.log("--- Phase 1: VM creation ---");
  const vmStart = performance.now();
  const vm = await AgentOs.create({ software });
  console.log(`  AgentOs.create() ${(performance.now() - vmStart).toFixed(1).padStart(30)} ms`);
  await vm.mkdir("/workspace", { recursive: true });

  // Phase 2: Kernel API baseline (non-WASM)
  console.log("\n--- Phase 2: Kernel API baseline ---");
  const kernelResults: BenchResult[] = [];
  await vm.writeFile("/workspace/test.txt", "Hello, World!\n".repeat(7));
  kernelResults.push(await bench("kernel.readFile() (100B)", () => vm.readFile("/workspace/test.txt").then(() => {})));
  kernelResults.push(await bench("kernel.stat()", () => vm.stat("/workspace/test.txt").then(() => {})));
  kernelResults.push(await bench("kernel.readdir()", () => vm.readdir("/workspace").then(() => {})));
  kernelResults.push(await bench("kernel.exists()", () => vm.exists("/workspace/test.txt").then(() => {})));
  for (const r of kernelResults) printResult(r);

  // Phase 3: WASM cold start
  console.log("\n--- Phase 3: WASM cold start ---");
  const cold = await bench("exec('echo hello') — cold", () => vm.exec("echo hello"));
  printResult(cold);

  // Phase 4: WASM warm (repeated)
  console.log("\n--- Phase 4: WASM warm (×5) ---");
  const warmResults: BenchResult[] = [];
  for (let i = 0; i < 5; i++) {
    warmResults.push(await bench(`exec('echo hello') — warm #${i + 1}`, () => vm.exec("echo hello")));
  }
  for (const r of warmResults) printResult(r);

  // Phase 5: Various commands
  console.log("\n--- Phase 5: Various commands (warm) ---");
  for (const cmd of ["ls /workspace", "cat /workspace/test.txt", "pwd", "echo test", "wc -c /workspace/test.txt"]) {
    printResult(await bench(`exec('${cmd}')`, () => vm.exec(cmd)));
  }

  // Phase 6: Kernel vs WASM comparison
  console.log("\n--- Phase 6: Kernel API vs WASM ---");
  const kr = await bench("kernel.readFile()", () => vm.readFile("/workspace/test.txt").then(() => {}));
  const wr = await bench("exec('cat ...')", () => vm.exec("cat /workspace/test.txt"));
  printResult(kr);
  printResult(wr);
  console.log(`  → WASM overhead: ${(wr.durationMs / kr.durationMs).toFixed(0)}x\n`);

  // Phase 7: Batch throughput (15 commands)
  console.log("--- Phase 7: 15-command batch ---");
  const times: number[] = [];
  for (let i = 0; i < 15; i++) {
    const s = performance.now();
    await vm.exec(`echo iteration_${i}`);
    times.push(performance.now() - s);
  }
  const total = times.reduce((a, b) => a + b, 0);
  console.log(`  Total:   ${total.toFixed(1)} ms`);
  console.log(`  Average: ${(total / times.length).toFixed(1)} ms/cmd`);
  console.log(`  Min:     ${Math.min(...times).toFixed(1)} ms`);
  console.log(`  Max:     ${Math.max(...times).toFixed(1)} ms`);
  console.log(`  120s timeout reached: ${total > 120000 ? "YES" : "NO"}`);

  await vm.dispose();
}

main().catch((err) => { console.error(err); process.exit(1); });

Results (macOS Apple Silicon, Node.js v24.14.1):

Kernel API (readFile, stat):    ~0.1 ms
WASM exec warm (echo):          ~19 ms
WASM exec warm (ls):             ~48 ms
15× echo batch:                 283 ms (18.9 ms/cmd)

Benchmark: simulated spawn latency (before/after persistent shell)

To validate the persistent shell approach without a Fargate environment, this benchmark injects an artificial 500ms delay into kernel.exec() to simulate the per-spawn overhead. The persistent shell patch bypasses kernel.exec() after the first shell initialization, so only the initial spawn incurs the delay.

Run without patch for "before", apply patch then re-run for "after".

bench-simulated-latency.ts
import { AgentOs } from "@rivet-dev/agent-os-core";
import type { AgentOs as AgentOsType, SoftwareInput } from "@rivet-dev/agent-os-core";
import common from "@rivet-dev/agent-os-common";

const software: SoftwareInput[] = [common];
const SPAWN_DELAY_MS = parseInt(process.env.SIMULATED_SPAWN_DELAY_MS ?? "500", 10);

function injectSpawnDelay(vm: AgentOsType, delayMs: number) {
  const kernel = (vm as any).kernel;
  if (!kernel) { console.warn("[warn] kernel not accessible"); return; }
  const originalKernelExec = kernel.exec.bind(kernel);
  kernel.exec = async (command: string, options?: any) => {
    await new Promise((r) => setTimeout(r, delayMs));
    return originalKernelExec(command, options);
  };
}

async function main() {
  console.log("=== Simulated spawn-latency before/after benchmark ===\n");
  console.log(`Platform:        ${process.platform} ${process.arch}`);
  console.log(`Node.js:         ${process.version}`);
  console.log(`Simulated delay: ${SPAWN_DELAY_MS} ms/spawn\n`);

  const vm = await AgentOs.create({ software });
  await vm.mkdir("/workspace", { recursive: true });
  await vm.writeFile("/workspace/test.txt", "Hello World\n".repeat(10));

  // Detect persistent shell patch
  await vm.exec("echo detect");
  const isPatched = !!(vm as any).__persistentShell;
  const mode = isPatched ? "AFTER (persistent shell)" : "BEFORE (original exec)";
  console.log(`Mode: ${mode}\n`);

  // Inject delay AFTER patch detection (so init shell spawn is unaffected)
  injectSpawnDelay(vm, SPAWN_DELAY_MS);

  // echo ×15
  console.log("--- echo ×15 ---");
  const warmupStart = performance.now();
  await vm.exec("echo warmup");
  console.log(`  Warmup:  ${(performance.now() - warmupStart).toFixed(1)} ms`);

  const times: number[] = [];
  for (let i = 0; i < 15; i++) {
    const s = performance.now();
    await vm.exec(`echo iteration_${i}`);
    times.push(performance.now() - s);
  }
  const total = times.reduce((a, b) => a + b, 0);
  console.log(`  Total:   ${total.toFixed(1)} ms`);
  console.log(`  Average: ${(total / times.length).toFixed(1)} ms/cmd`);

  // Various commands
  console.log("\n--- Various commands ---");
  for (const cmd of ["ls /workspace", "cat /workspace/test.txt", "pwd", "echo hello", "wc -c /workspace/test.txt"]) {
    const s = performance.now();
    await vm.exec(cmd);
    console.log(`  ${cmd.padEnd(35)} ${(performance.now() - s).toFixed(1).padStart(8)} ms`);
  }

  // Summary
  console.log(`\n=== Summary ===`);
  console.log(`Mode:    ${mode}`);
  console.log(`echo×15: ${total.toFixed(1)} ms (${(total / 15).toFixed(1)} ms/cmd)`);
  if (!isPatched) {
    console.log(`→ ${((SPAWN_DELAY_MS * 15 / total) * 100).toFixed(0)}% of time is spawn overhead`);
  }

  await vm.dispose();
}

main().catch((err) => { console.error(err); process.exit(1); });

PoC patch: persistent shell for AgentOs.exec()

This monkey-patch replaces AgentOs.exec() to lazily initialize a persistent WASM shell via openShell() and route subsequent commands through stdin with delimiter-based output boundary detection.

apply-patch.cjs — patches the compiled agent-os-core dist
#!/usr/bin/env node
const fs = require("fs");

const file = process.argv[2];
if (!file) { console.error("Usage: node apply-patch.cjs <agent-os.js path>"); process.exit(1); }

let src = fs.readFileSync(file, "utf8");
if (src.includes("__persistentShell")) { console.log("[patch] already patched"); process.exit(0); }

const originalExec = `    async exec(command, options) {
        return this.kernel.exec(command, options);
    }`;

const patchedExec = `    async exec(command, options) {
        // --- Persistent shell patch ---
        if (!this.__persistentShell) {
            this.__persistentShell = this._initPersistentShell();
        }
        const shell = await this.__persistentShell;
        if (shell) {
            return shell.exec(command, options);
        }
        return this.kernel.exec(command, options);
    }
    _initPersistentShell() {
        const self = this;
        return new Promise((resolveInit) => {
            try {
                const { shellId } = self.openShell();
                let initialized = false;
                let buffer = '';
                let cmdCounter = 0;
                let pendingResolve = null;
                let pendingDelimId = '';
                const decoder = new TextDecoder();

                const initDelim = '__AOSINIT_' + Date.now() + '__';
                let initBuffer = '';

                self.onShellData(shellId, (data) => {
                    const chunk = decoder.decode(data);

                    if (!initialized) {
                        initBuffer += chunk;
                        if (initBuffer.includes(initDelim)) {
                            initialized = true;
                            initBuffer = '';
                            resolveInit(shellApi);
                        }
                        return;
                    }

                    if (!pendingResolve) return;
                    buffer += chunk;

                    const delimRe = new RegExp('___AOSDELIM_' + pendingDelimId + '_(\\\\d+)___');
                    const m = buffer.match(delimRe);
                    if (m) {
                        const exitCode = parseInt(m[1], 10);
                        const raw = buffer.substring(0, m.index);
                        const lines = raw.split('\\n');
                        const cleaned = lines.filter(l =>
                            !l.includes('___AOSDELIM_') &&
                            !l.includes('printf ') &&
                            !l.includes('\\\\x00')
                        );
                        if (cleaned.length > 0 && cleaned[0].trim().startsWith('{')) {
                            cleaned.shift();
                        }
                        const stdout = cleaned.join('\\n')
                            .replace(/^\\n+/, '')
                            .replace(/\\n+$/, '');

                        buffer = buffer.substring(m.index + m[0].length);
                        const resolve = pendingResolve;
                        pendingResolve = null;
                        pendingDelimId = '';
                        resolve({
                            exitCode,
                            stdout: stdout ? stdout + '\\n' : '',
                            stderr: '',
                        });
                    }
                });

                const shellApi = {
                    exec: (cmd, opts) => {
                        if (opts && (opts.onStdout || opts.onStderr)) {
                            return self.kernel.exec(cmd, opts);
                        }
                        return new Promise((resolve) => {
                            cmdCounter++;
                            const delimId = 'c' + cmdCounter;
                            buffer = '';
                            pendingResolve = resolve;
                            pendingDelimId = delimId;
                            const payload = cmd + '\\nprintf "___AOSDELIM_' + delimId + '_%d___\\\\n" "$?"\\n';
                            self.writeShell(shellId, payload);
                        });
                    },
                };

                setTimeout(() => {
                    self.writeShell(shellId, 'echo ' + initDelim + '\\n');
                }, 50);

                setTimeout(() => {
                    if (!initialized) { initialized = true; resolveInit(null); }
                }, 15000);

            } catch (_e) { resolveInit(null); }
        });
    }`;

if (src.includes(originalExec)) {
  src = src.replace(originalExec, patchedExec);
} else {
  const pat = /    async exec\(command, options\) \{\n        return this\.kernel\.exec\(command, options\);\n    \}/;
  if (pat.test(src)) { src = src.replace(pat, patchedExec); }
  else { console.error("[patch] Could not find exec() to patch"); process.exit(1); }
}

fs.writeFileSync(file, src);
console.log("[patch] Patched exec() → persistent shell");
patch-exec-persistent-shell.sh — applies the patch to installed node_modules
#!/usr/bin/env bash
set -euo pipefail

DIST_FILE="$(find node_modules -path '*/@rivet-dev/agent-os-core/dist/agent-os.js' -type f | head -1)"

if [ -z "$DIST_FILE" ]; then
  echo "[patch] agent-os-core dist not found, skipping"; exit 0
fi
if grep -q '__persistentShell' "$DIST_FILE"; then
  echo "[patch] already patched, skipping"; exit 0
fi

echo "[patch] Patching $DIST_FILE ..."
node scripts/apply-patch.cjs "$DIST_FILE"

Results with 500ms simulated spawn delay:

Metric Before (original) After (persistent shell) Speedup
echo ×15 8,194 ms 269 ms 30×
ls /workspace 581 ms 45 ms 13×
pwd 548 ms 18 ms 30×

The persistent shell bypasses kernel.exec() after initialization, so the injected delay only affects the first spawn. This models environments where the bottleneck is in the spawn path (process creation + WASM binary I/O + JIT compile).

Note on the PoC patch

This patch is a proof-of-concept to validate the approach. Known limitations:

  • Output parsing heuristics (filtering lines containing printf, stripping lines starting with {) are fragile and could mishandle legitimate output
  • No concurrent command support (single pendingResolve)
  • stderr is always empty (not captured separately from the persistent shell)
  • A production implementation should live in the kernel layer, not as a monkey-patch

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions