Summary
Every WASM shell command (ls, cat, pwd, etc.) spawns a new Node.js child process and runs WebAssembly.compile() on the full module binary from scratch. On environments with low single-thread performance or high I/O latency (e.g., ECS Fargate with EFS mounts), this per-command overhead reaches ~5 seconds — causing a 15-command agent session to hit the 120-second ACP timeout.
The existing openShell() API already provides a persistent shell mechanism, but exec() does not use it. Routing exec through a persistent shell would reduce overhead to a single spawn at session start.
Per-command execution flow
Each call to exec("ls /workspace") follows this path:
NativeKernelProxy.exec() → resolveExecCommand() wraps as sh -c 'ls /workspace' (native-kernel-proxy.ts:357-406)
NativeKernelProxy.spawn() → sidecar RPC (native-kernel-proxy.ts:408)
WasmExecutionEngine::start_execution() (wasm.rs:355-450)
create_node_child() — Command::new(node_binary()).spawn() — new OS process (wasm.rs:486-524)
fs.readFile(modulePath) — reads multi-MB WASM binary from disk (node_import_cache.rs:7886)
WebAssembly.compile(moduleBytes) — full JIT compile (node_import_cache.rs:7889)
- WASI init → execute → process exits — compiled module discarded
Steps 4–7 repeat identically for every ls, cat, pwd. There is no process or module reuse.
Why existing caches don't help
| Mechanism |
What it caches |
Helps with WASM compile? |
Helps with process spawn? |
Prewarm (wasm.rs:527) |
Marker file to skip re-prewarm |
No — compiled module discarded on process.exit(0) (node_import_cache.rs:7891) |
No |
Node.js compile cache (runtime_support.rs:33) |
JS source → V8 bytecode |
No — JS only, not WASM |
No |
Import cache (NodeImportCache) |
Runner/loader files on disk |
No |
No |
Impact in resource-constrained environments
On macOS (Apple Silicon), the per-command overhead is ~19ms — invisible in practice. But on environments with weaker single-thread CPU or network-attached storage, the fixed costs scale dramatically.
Example: ECS Fargate with EFS workspace (from production logs):
| Operation |
Time |
ls /workspace |
~5.3s |
mkdir -p /workspace |
~5.5s |
pwd |
~4.3s |
| 15 commands total |
~75s → ACP timeout at 120s |
Contributing factors on such environments:
- WASM binary read from network filesystem: ~200-500ms (vs ~3ms local SSD)
WebAssembly.compile() on lower-clock vCPU: ~500-2000ms (vs ~5-10ms Apple Silicon)
- Node.js process spawn overhead: ~100-500ms (vs ~5ms)
Proposed fix: route exec through a persistent shell
openShell() (native-kernel-proxy.ts:499-544) already spawns a persistent sh process with stdin write and stdout callbacks. AgentOs exposes it publicly (agent-os.ts:1814-1849). The coreutils WASM binary is a multicall (BusyBox-style) binary, so ls, cat, etc. execute as builtins within the persistent sh without additional WASM spawns.
Current: [Node spawn → WASM read → JIT compile → WASI init → sh -c "ls"] × N
Proposed: [Node spawn → WASM read → JIT compile → WASI init → sh] × 1
[stdin "ls\n" → stdout] × N
Implementation options (broadest-impact first):
NativeKernelProxy.exec() (native-kernel-proxy.ts:357) — intercept sh -c commands and route through a lazily-initialized persistent shell
child_process polyfill bridge (node_import_cache.rs:3693) — benefits guest code calling child_process.spawn("sh", ...) (agent SDKs like Claude CLI use this path)
AgentOs.exec() (agent-os.ts) — simplest, benefits direct callers only
Design considerations:
- Output boundary detection: delimiter pattern (e.g.,
printf "___DELIM_%d___\n" "$?") to mark command end and capture exit code
- Fallback: commands needing streaming
onStdout/onStderr callbacks bypass the persistent shell
- Initialization sync: echo a known token after spawn, wait for it before accepting commands
Verification
Note: The benchmarks below were run on a local macOS (Apple Silicon) machine — not on Fargate or similar resource-constrained environments. To validate the persistent shell approach without access to such environments, the simulated-latency benchmark injects an artificial delay into kernel.exec() to approximate the per-spawn overhead observed in production logs. The absolute numbers differ from real Fargate, but the relative improvement (spawn bypass) accurately reflects the mechanism.
Benchmark: baseline overhead per layer
The following script measures per-layer overhead of WASM command execution. Install @rivet-dev/agent-os-core and @rivet-dev/agent-os-common, then run with node --import tsx bench-wasm-exec.ts.
bench-wasm-exec.ts
import { AgentOs } from "@rivet-dev/agent-os-core";
import type { SoftwareInput } from "@rivet-dev/agent-os-core";
import common from "@rivet-dev/agent-os-common";
const software: SoftwareInput[] = [common];
interface BenchResult {
label: string;
durationMs: number;
stdout?: string;
stderr?: string;
}
async function bench(
label: string,
fn: () => Promise<{ stdout?: string; stderr?: string } | void>,
): Promise<BenchResult> {
const start = performance.now();
const result = await fn();
const durationMs = performance.now() - start;
return { label, durationMs, stdout: result?.stdout, stderr: result?.stderr };
}
function printResult(r: BenchResult) {
const ms = r.durationMs.toFixed(1);
console.log(` ${r.label.padEnd(45)} ${ms.padStart(8)} ms`);
}
async function main() {
console.log("=== WASM command execution benchmark ===\n");
console.log(`Platform: ${process.platform} ${process.arch}`);
console.log(`Node.js: ${process.version}\n`);
// Phase 1: VM creation
console.log("--- Phase 1: VM creation ---");
const vmStart = performance.now();
const vm = await AgentOs.create({ software });
console.log(` AgentOs.create() ${(performance.now() - vmStart).toFixed(1).padStart(30)} ms`);
await vm.mkdir("/workspace", { recursive: true });
// Phase 2: Kernel API baseline (non-WASM)
console.log("\n--- Phase 2: Kernel API baseline ---");
const kernelResults: BenchResult[] = [];
await vm.writeFile("/workspace/test.txt", "Hello, World!\n".repeat(7));
kernelResults.push(await bench("kernel.readFile() (100B)", () => vm.readFile("/workspace/test.txt").then(() => {})));
kernelResults.push(await bench("kernel.stat()", () => vm.stat("/workspace/test.txt").then(() => {})));
kernelResults.push(await bench("kernel.readdir()", () => vm.readdir("/workspace").then(() => {})));
kernelResults.push(await bench("kernel.exists()", () => vm.exists("/workspace/test.txt").then(() => {})));
for (const r of kernelResults) printResult(r);
// Phase 3: WASM cold start
console.log("\n--- Phase 3: WASM cold start ---");
const cold = await bench("exec('echo hello') — cold", () => vm.exec("echo hello"));
printResult(cold);
// Phase 4: WASM warm (repeated)
console.log("\n--- Phase 4: WASM warm (×5) ---");
const warmResults: BenchResult[] = [];
for (let i = 0; i < 5; i++) {
warmResults.push(await bench(`exec('echo hello') — warm #${i + 1}`, () => vm.exec("echo hello")));
}
for (const r of warmResults) printResult(r);
// Phase 5: Various commands
console.log("\n--- Phase 5: Various commands (warm) ---");
for (const cmd of ["ls /workspace", "cat /workspace/test.txt", "pwd", "echo test", "wc -c /workspace/test.txt"]) {
printResult(await bench(`exec('${cmd}')`, () => vm.exec(cmd)));
}
// Phase 6: Kernel vs WASM comparison
console.log("\n--- Phase 6: Kernel API vs WASM ---");
const kr = await bench("kernel.readFile()", () => vm.readFile("/workspace/test.txt").then(() => {}));
const wr = await bench("exec('cat ...')", () => vm.exec("cat /workspace/test.txt"));
printResult(kr);
printResult(wr);
console.log(` → WASM overhead: ${(wr.durationMs / kr.durationMs).toFixed(0)}x\n`);
// Phase 7: Batch throughput (15 commands)
console.log("--- Phase 7: 15-command batch ---");
const times: number[] = [];
for (let i = 0; i < 15; i++) {
const s = performance.now();
await vm.exec(`echo iteration_${i}`);
times.push(performance.now() - s);
}
const total = times.reduce((a, b) => a + b, 0);
console.log(` Total: ${total.toFixed(1)} ms`);
console.log(` Average: ${(total / times.length).toFixed(1)} ms/cmd`);
console.log(` Min: ${Math.min(...times).toFixed(1)} ms`);
console.log(` Max: ${Math.max(...times).toFixed(1)} ms`);
console.log(` 120s timeout reached: ${total > 120000 ? "YES" : "NO"}`);
await vm.dispose();
}
main().catch((err) => { console.error(err); process.exit(1); });
Results (macOS Apple Silicon, Node.js v24.14.1):
Kernel API (readFile, stat): ~0.1 ms
WASM exec warm (echo): ~19 ms
WASM exec warm (ls): ~48 ms
15× echo batch: 283 ms (18.9 ms/cmd)
Benchmark: simulated spawn latency (before/after persistent shell)
To validate the persistent shell approach without a Fargate environment, this benchmark injects an artificial 500ms delay into kernel.exec() to simulate the per-spawn overhead. The persistent shell patch bypasses kernel.exec() after the first shell initialization, so only the initial spawn incurs the delay.
Run without patch for "before", apply patch then re-run for "after".
bench-simulated-latency.ts
import { AgentOs } from "@rivet-dev/agent-os-core";
import type { AgentOs as AgentOsType, SoftwareInput } from "@rivet-dev/agent-os-core";
import common from "@rivet-dev/agent-os-common";
const software: SoftwareInput[] = [common];
const SPAWN_DELAY_MS = parseInt(process.env.SIMULATED_SPAWN_DELAY_MS ?? "500", 10);
function injectSpawnDelay(vm: AgentOsType, delayMs: number) {
const kernel = (vm as any).kernel;
if (!kernel) { console.warn("[warn] kernel not accessible"); return; }
const originalKernelExec = kernel.exec.bind(kernel);
kernel.exec = async (command: string, options?: any) => {
await new Promise((r) => setTimeout(r, delayMs));
return originalKernelExec(command, options);
};
}
async function main() {
console.log("=== Simulated spawn-latency before/after benchmark ===\n");
console.log(`Platform: ${process.platform} ${process.arch}`);
console.log(`Node.js: ${process.version}`);
console.log(`Simulated delay: ${SPAWN_DELAY_MS} ms/spawn\n`);
const vm = await AgentOs.create({ software });
await vm.mkdir("/workspace", { recursive: true });
await vm.writeFile("/workspace/test.txt", "Hello World\n".repeat(10));
// Detect persistent shell patch
await vm.exec("echo detect");
const isPatched = !!(vm as any).__persistentShell;
const mode = isPatched ? "AFTER (persistent shell)" : "BEFORE (original exec)";
console.log(`Mode: ${mode}\n`);
// Inject delay AFTER patch detection (so init shell spawn is unaffected)
injectSpawnDelay(vm, SPAWN_DELAY_MS);
// echo ×15
console.log("--- echo ×15 ---");
const warmupStart = performance.now();
await vm.exec("echo warmup");
console.log(` Warmup: ${(performance.now() - warmupStart).toFixed(1)} ms`);
const times: number[] = [];
for (let i = 0; i < 15; i++) {
const s = performance.now();
await vm.exec(`echo iteration_${i}`);
times.push(performance.now() - s);
}
const total = times.reduce((a, b) => a + b, 0);
console.log(` Total: ${total.toFixed(1)} ms`);
console.log(` Average: ${(total / times.length).toFixed(1)} ms/cmd`);
// Various commands
console.log("\n--- Various commands ---");
for (const cmd of ["ls /workspace", "cat /workspace/test.txt", "pwd", "echo hello", "wc -c /workspace/test.txt"]) {
const s = performance.now();
await vm.exec(cmd);
console.log(` ${cmd.padEnd(35)} ${(performance.now() - s).toFixed(1).padStart(8)} ms`);
}
// Summary
console.log(`\n=== Summary ===`);
console.log(`Mode: ${mode}`);
console.log(`echo×15: ${total.toFixed(1)} ms (${(total / 15).toFixed(1)} ms/cmd)`);
if (!isPatched) {
console.log(`→ ${((SPAWN_DELAY_MS * 15 / total) * 100).toFixed(0)}% of time is spawn overhead`);
}
await vm.dispose();
}
main().catch((err) => { console.error(err); process.exit(1); });
PoC patch: persistent shell for AgentOs.exec()
This monkey-patch replaces AgentOs.exec() to lazily initialize a persistent WASM shell via openShell() and route subsequent commands through stdin with delimiter-based output boundary detection.
apply-patch.cjs — patches the compiled agent-os-core dist
#!/usr/bin/env node
const fs = require("fs");
const file = process.argv[2];
if (!file) { console.error("Usage: node apply-patch.cjs <agent-os.js path>"); process.exit(1); }
let src = fs.readFileSync(file, "utf8");
if (src.includes("__persistentShell")) { console.log("[patch] already patched"); process.exit(0); }
const originalExec = ` async exec(command, options) {
return this.kernel.exec(command, options);
}`;
const patchedExec = ` async exec(command, options) {
// --- Persistent shell patch ---
if (!this.__persistentShell) {
this.__persistentShell = this._initPersistentShell();
}
const shell = await this.__persistentShell;
if (shell) {
return shell.exec(command, options);
}
return this.kernel.exec(command, options);
}
_initPersistentShell() {
const self = this;
return new Promise((resolveInit) => {
try {
const { shellId } = self.openShell();
let initialized = false;
let buffer = '';
let cmdCounter = 0;
let pendingResolve = null;
let pendingDelimId = '';
const decoder = new TextDecoder();
const initDelim = '__AOSINIT_' + Date.now() + '__';
let initBuffer = '';
self.onShellData(shellId, (data) => {
const chunk = decoder.decode(data);
if (!initialized) {
initBuffer += chunk;
if (initBuffer.includes(initDelim)) {
initialized = true;
initBuffer = '';
resolveInit(shellApi);
}
return;
}
if (!pendingResolve) return;
buffer += chunk;
const delimRe = new RegExp('___AOSDELIM_' + pendingDelimId + '_(\\\\d+)___');
const m = buffer.match(delimRe);
if (m) {
const exitCode = parseInt(m[1], 10);
const raw = buffer.substring(0, m.index);
const lines = raw.split('\\n');
const cleaned = lines.filter(l =>
!l.includes('___AOSDELIM_') &&
!l.includes('printf ') &&
!l.includes('\\\\x00')
);
if (cleaned.length > 0 && cleaned[0].trim().startsWith('{')) {
cleaned.shift();
}
const stdout = cleaned.join('\\n')
.replace(/^\\n+/, '')
.replace(/\\n+$/, '');
buffer = buffer.substring(m.index + m[0].length);
const resolve = pendingResolve;
pendingResolve = null;
pendingDelimId = '';
resolve({
exitCode,
stdout: stdout ? stdout + '\\n' : '',
stderr: '',
});
}
});
const shellApi = {
exec: (cmd, opts) => {
if (opts && (opts.onStdout || opts.onStderr)) {
return self.kernel.exec(cmd, opts);
}
return new Promise((resolve) => {
cmdCounter++;
const delimId = 'c' + cmdCounter;
buffer = '';
pendingResolve = resolve;
pendingDelimId = delimId;
const payload = cmd + '\\nprintf "___AOSDELIM_' + delimId + '_%d___\\\\n" "$?"\\n';
self.writeShell(shellId, payload);
});
},
};
setTimeout(() => {
self.writeShell(shellId, 'echo ' + initDelim + '\\n');
}, 50);
setTimeout(() => {
if (!initialized) { initialized = true; resolveInit(null); }
}, 15000);
} catch (_e) { resolveInit(null); }
});
}`;
if (src.includes(originalExec)) {
src = src.replace(originalExec, patchedExec);
} else {
const pat = / async exec\(command, options\) \{\n return this\.kernel\.exec\(command, options\);\n \}/;
if (pat.test(src)) { src = src.replace(pat, patchedExec); }
else { console.error("[patch] Could not find exec() to patch"); process.exit(1); }
}
fs.writeFileSync(file, src);
console.log("[patch] Patched exec() → persistent shell");
patch-exec-persistent-shell.sh — applies the patch to installed node_modules
#!/usr/bin/env bash
set -euo pipefail
DIST_FILE="$(find node_modules -path '*/@rivet-dev/agent-os-core/dist/agent-os.js' -type f | head -1)"
if [ -z "$DIST_FILE" ]; then
echo "[patch] agent-os-core dist not found, skipping"; exit 0
fi
if grep -q '__persistentShell' "$DIST_FILE"; then
echo "[patch] already patched, skipping"; exit 0
fi
echo "[patch] Patching $DIST_FILE ..."
node scripts/apply-patch.cjs "$DIST_FILE"
Results with 500ms simulated spawn delay:
| Metric |
Before (original) |
After (persistent shell) |
Speedup |
| echo ×15 |
8,194 ms |
269 ms |
30× |
| ls /workspace |
581 ms |
45 ms |
13× |
| pwd |
548 ms |
18 ms |
30× |
The persistent shell bypasses kernel.exec() after initialization, so the injected delay only affects the first spawn. This models environments where the bottleneck is in the spawn path (process creation + WASM binary I/O + JIT compile).
Note on the PoC patch
This patch is a proof-of-concept to validate the approach. Known limitations:
- Output parsing heuristics (filtering lines containing
printf, stripping lines starting with {) are fragile and could mishandle legitimate output
- No concurrent command support (single
pendingResolve)
- stderr is always empty (not captured separately from the persistent shell)
- A production implementation should live in the kernel layer, not as a monkey-patch
Summary
Every WASM shell command (
ls,cat,pwd, etc.) spawns a new Node.js child process and runsWebAssembly.compile()on the full module binary from scratch. On environments with low single-thread performance or high I/O latency (e.g., ECS Fargate with EFS mounts), this per-command overhead reaches ~5 seconds — causing a 15-command agent session to hit the 120-second ACP timeout.The existing
openShell()API already provides a persistent shell mechanism, butexec()does not use it. Routing exec through a persistent shell would reduce overhead to a single spawn at session start.Per-command execution flow
Each call to
exec("ls /workspace")follows this path:NativeKernelProxy.exec()→resolveExecCommand()wraps assh -c 'ls /workspace'(native-kernel-proxy.ts:357-406)NativeKernelProxy.spawn()→ sidecar RPC (native-kernel-proxy.ts:408)WasmExecutionEngine::start_execution()(wasm.rs:355-450)create_node_child()—Command::new(node_binary()).spawn()— new OS process (wasm.rs:486-524)fs.readFile(modulePath)— reads multi-MB WASM binary from disk (node_import_cache.rs:7886)WebAssembly.compile(moduleBytes)— full JIT compile (node_import_cache.rs:7889)Steps 4–7 repeat identically for every
ls,cat,pwd. There is no process or module reuse.Why existing caches don't help
wasm.rs:527)process.exit(0)(node_import_cache.rs:7891)runtime_support.rs:33)NodeImportCache)Impact in resource-constrained environments
On macOS (Apple Silicon), the per-command overhead is ~19ms — invisible in practice. But on environments with weaker single-thread CPU or network-attached storage, the fixed costs scale dramatically.
Example: ECS Fargate with EFS workspace (from production logs):
ls /workspacemkdir -p /workspacepwdContributing factors on such environments:
WebAssembly.compile()on lower-clock vCPU: ~500-2000ms (vs ~5-10ms Apple Silicon)Proposed fix: route exec through a persistent shell
openShell()(native-kernel-proxy.ts:499-544) already spawns a persistentshprocess with stdin write and stdout callbacks.AgentOsexposes it publicly (agent-os.ts:1814-1849). The coreutils WASM binary is a multicall (BusyBox-style) binary, sols,cat, etc. execute as builtins within the persistentshwithout additional WASM spawns.Implementation options (broadest-impact first):
NativeKernelProxy.exec()(native-kernel-proxy.ts:357) — interceptsh -ccommands and route through a lazily-initialized persistent shellchild_processpolyfill bridge (node_import_cache.rs:3693) — benefits guest code callingchild_process.spawn("sh", ...)(agent SDKs like Claude CLI use this path)AgentOs.exec()(agent-os.ts) — simplest, benefits direct callers onlyDesign considerations:
printf "___DELIM_%d___\n" "$?") to mark command end and capture exit codeonStdout/onStderrcallbacks bypass the persistent shellVerification
Benchmark: baseline overhead per layer
The following script measures per-layer overhead of WASM command execution. Install
@rivet-dev/agent-os-coreand@rivet-dev/agent-os-common, then run withnode --import tsx bench-wasm-exec.ts.bench-wasm-exec.ts
Results (macOS Apple Silicon, Node.js v24.14.1):
Benchmark: simulated spawn latency (before/after persistent shell)
To validate the persistent shell approach without a Fargate environment, this benchmark injects an artificial 500ms delay into
kernel.exec()to simulate the per-spawn overhead. The persistent shell patch bypasseskernel.exec()after the first shell initialization, so only the initial spawn incurs the delay.Run without patch for "before", apply patch then re-run for "after".
bench-simulated-latency.ts
PoC patch: persistent shell for AgentOs.exec()
This monkey-patch replaces
AgentOs.exec()to lazily initialize a persistent WASM shell viaopenShell()and route subsequent commands through stdin with delimiter-based output boundary detection.apply-patch.cjs — patches the compiled agent-os-core dist
patch-exec-persistent-shell.sh — applies the patch to installed node_modules
Results with 500ms simulated spawn delay:
The persistent shell bypasses
kernel.exec()after initialization, so the injected delay only affects the first spawn. This models environments where the bottleneck is in the spawn path (process creation + WASM binary I/O + JIT compile).Note on the PoC patch
This patch is a proof-of-concept to validate the approach. Known limitations:
printf, stripping lines starting with{) are fragile and could mishandle legitimate outputpendingResolve)