Skip to content

feat(container-runner): persist agent container stdout+stderr to disk#2727

Open
manojp99 wants to merge 1 commit into
nanocoai:mainfrom
manojp99:feat/persist-container-logs-nanoclaw
Open

feat(container-runner): persist agent container stdout+stderr to disk#2727
manojp99 wants to merge 1 commit into
nanocoai:mainfrom
manojp99:feat/persist-container-logs-nanoclaw

Conversation

@manojp99

Copy link
Copy Markdown

Overview

This is a sibling PR to microsoft/amplifier-app-nanoclaw#7 — the same container log persistence feature, now proposed for the original nanoclaw repo.

Problem

Agent containers' stdout and stderr are currently discarded:

  • stdout is dropped entirely on the floor (the container's stdout is opened but then ignored)
  • stderr is piped into log.debug() which only displays if LOG_LEVEL=debug, and even then only goes to the host process' stdout/stderr (transient, not persisted)

Once a container is removed (via docker rm), all evidence is gone. This makes post-incident debugging impossible.

Solution

Per-instance log file persistence via inherited file descriptor: the container writes directly to the file at the kernel level after spawn() returns, so the host process is uninvolved. Files accumulate at:

logs/containers/<group>/<containerName>.log

containerName already includes Date.now(), so files are unique per spawn.

Configuration

New env var: NANOCLAW_CONTAINER_LOGSdefault is DISABLED (no on-disk container logs). Opt in:

# Permanent (in .env):
echo "NANOCLAW_CONTAINER_LOGS=enabled" >> .env

# One-off:
NANOCLAW_CONTAINER_LOGS=enabled pnpm start

When disabled: spawns with stdio=['ignore','ignore','ignore'] — exactly the same behavior as before this change. Zero surprise for upgraders.

When enabled but file open fails: falls back to stdio=['ignore','ignore','ignore'] — persistence is best-effort and never blocks container spawn.

What changed

src/config.ts

  • Added NANOCLAW_CONTAINER_LOGS to the envConfig array
  • Exported CONTAINER_LOGS_ENABLED (true only if var is exactly "enabled")

src/container-runner.ts

  • Imported CONTAINER_LOGS_ENABLED
  • Before spawn: when enabled, open an append-mode log file at the calculated path and get its fd
  • Spawn with stdio: ['ignore', logFd, logFd] (or all 'ignore' when disabled)
  • Close fd in parent right after spawn — the child inherited it
  • Removed the old container.stderr → log.debug handler and the stdout no-op handler

Inspection & cleanup

# Tail a live container:
tail -f logs/containers/<group>/<containerName>.log

# See what's accumulated:
ls -lh logs/containers/*/

# Clear all:
rm -rf logs/containers/

# Clear one agent's logs:
rm -rf logs/containers/<group>/

# Prune everything older than 7 days:
find logs/containers -name "*.log" -mtime +7 -delete

What's NOT included (and why)

  • Log rotation: Operators manage retention manually. If manual pruning becomes tedious, that's the signal to add a sweep in host-sweep.ts.
  • JSON envelope / structured format: The logs are plain text, one container per file. Machines can grep; humans can tail -f.
  • Selective disabling per-agent: Feature is global on/off. If some agents need persistence and others don't, environment layering (different .env per checkout) is the tool.

Verification

  • Typecheck: ✅ Clean
  • Tests: ✅ 32/32 passing (container-runner.test, container-restart.test, host-sweep.test)
  • ESLint: The one new no-catch-all warning at the log file open matches the pattern already used 5× elsewhere in the file (intentional best-effort design).

This is a minimal, isolated change to the container spawning path. No new dependencies, no retention logic, no complexity.

Agent containers' stdout and stderr were discarded — stdout dropped on
the floor, stderr folded into log.debug which is invisible unless
LOG_LEVEL=debug. Once a container was removed, all evidence was gone.

This adds per-instance log file persistence via inherited file descriptor:
the container writes directly to the file at the kernel level, so the
host process is uninvolved after spawn(). Files accumulate at
logs/containers/<group>/<containerName>.log and can be inspected with
tail/less/grep or pruned with rm/find/cron.

- New env var: NANOCLAW_CONTAINER_LOGS — default is DISABLED (no
  on-disk container logs). Set NANOCLAW_CONTAINER_LOGS=enabled in
  .env or the environment to turn persistence on.
- When disabled, spawns with stdio=[ignore,ignore,ignore] — exactly
  the same effective behavior as before this change.
- When enabled, file open failures fall through to stdio=ignore —
  persistence is best-effort and never blocks container spawn.
- Removed the old container.stderr → log.debug handler (gated behind
  LOG_LEVEL=debug anyway) and the stdout no-op handler.
- No rotation built in by design — operators manage retention manually
  via rm/find/cron. If manual pruning becomes tedious, that's the signal
  to add a sweep in host-sweep.ts.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant