Skip to content

fix(setup): reap orphaned agent containers on service stop#2708

Open
danilomendonca wants to merge 1 commit into
nanocoai:mainfrom
danilomendonca:fix/systemd-reap-orphan-containers
Open

fix(setup): reap orphaned agent containers on service stop#2708
danilomendonca wants to merge 1 commit into
nanocoai:mainfrom
danilomendonca:fix/systemd-reap-orphan-containers

Conversation

@danilomendonca

Copy link
Copy Markdown

Type of Change

  • Feature skill - adds a channel or integration (source code changes + SKILL.md)
  • Utility skill - adds a standalone tool (code files in .claude/skills/<name>/, no source changes)
  • Operational/container skill - adds a workflow or agent skill (SKILL.md only, no source changes)
  • Fix - bug fix or security fix to source code
  • Simplification - reduces or simplifies source code
  • Documentation - docs, README, or CONTRIBUTING changes only

Description

What — The generated Linux systemd unit now reaps this install's agent containers when the host service stops, via an ExecStopPost hook in setup/service.ts.

Why — The unit runs with KillMode=process, which deliberately spares the agent containers from systemd's cgroup kill so in-flight work can survive a host restart. The side effect is that on stop/restart those containers are orphaned until the next startup's cleanupOrphans() reaps them. In the meantime systemd logs a Found left-over process <pid> (docker) in control group while starting unit ... unclean termination warning on every restart, and a container lingers during the gap.

How it worksExecStopPost runs docker stop -t 1 against containers carrying this install's nanoclaw-install=<slug> label (slug interpolated via getInstallSlug). This reuses the exact label scoping and stop command the host already uses in cleanupOrphans()/stopContainer, so:

  • peer installs sharing the Docker daemon are never touched,
  • KillMode=process is kept, so the host still gets a clean SIGTERM shutdown,
  • startup cleanupOrphans() stays in place as a fallback for crash (non-graceful) exits.

Only the systemd path is affected; macOS/launchd and the nohup fallback are unchanged.

How it was tested

  • Planted a container labeled nanoclaw-install=<slug>, ran systemctl --user stop, and confirmed ExecStopPost ran (code=exited status=0) and reaped it — container list empty afterward.
  • Confirmed a container labeled for a different install on the same daemon was left running (label scoping holds).
  • Restarted the service and verified a clean start with no left-over process / unclean termination warning in the journal.
  • pnpm run build (tsc) passes.

KillMode=process spares agent containers from systemd's cgroup kill so
in-flight work survives a host restart, but it left them orphaned until
the next startup's cleanupOrphans() reaped them — emitting a "left-over
process / unclean termination" warning in between.

Add an ExecStopPost that reaps this install's containers when the host
stops, scoped by the nanoclaw-install=<slug> label so peer installs are
untouched, matching the host's `docker stop -t 1`. Startup
cleanupOrphans() stays as a fallback for crash cases.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions github-actions Bot added follows-guidelines PR was created using the current contributing template PR: Fix Bug fix labels Jun 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

follows-guidelines PR was created using the current contributing template PR: Fix Bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant