Skip to content

setup: recommended path leaves cli/local unwired but advertises pnpm run chat hi (hangs 120s) #2703

Description

@bigintersmind

Summary

A fresh install that follows the recommended setup path ends up with cli/local unwired, yet setup unconditionally advertises pnpm run chat hi in its final "Try these" panel and outro. The command then hangs for 120s and exits with timeout: no reply — with no hint that the cause is a missing wiring.

This is the same symptom as #2186 and #2401 (CLI chat times out), but a distinct root cause in the setup flow. The destinations half overlaps with #2389.

Environment

  • NanoClaw v2.0.76, macOS (launchd), Docker runtime
  • Channel chosen during setup: Telegram (via pairing onboarding)
  • Path taken: ping-test succeeded → chose "Continue with setup" (the option labeled recommended)

Root cause

The terminal-chat agent (the wiring cli/local → an agent, plus its return destination) is only ever created on one non-default branch of setup. But pnpm run chat hi is advertised to everyone.

In setup/auto.ts:

  1. cli-agent step (setup/auto.ts:332) creates a throwaway _ping-test agent, wires cli/local to it, and runs a ping/pong to verify the sandbox responds.
  2. On success it deletes that test agent (setup/auto.ts:363-365scripts/delete-cli-agent.ts), which cascades the wiring + destination away, deliberately leaving the cli/local messaging group behind — now orphaned/unwired.
  3. A "What next?" prompt (setup/auto.ts:382-397) offers:
    • "Continue with setup" — labeled "recommended" (:389) — creates no terminal agent.
    • "Pause here and chat…" (:399) — only this branch creates a real <name>'s Terminal agent via scripts/init-cli-agent.ts (:404), which wires cli/local.
  4. The channel step then creates a separate agent for Telegram/Discord/etc. (scripts/init-first-agent.ts, whose docstring at line 10 explicitly states "CLI channel wiring is handled separately").
  5. The final panel (setup/auto.ts:543) and outro (:573) tell every user to run pnpm run chat hi.

So the recommended path (continue → wire a messaging channel) leaves pnpm run chat advertised but non-functional.

Reproduction

  1. Run pnpm run setup.
  2. Let the assistant ping-test succeed.
  3. At "What next?", choose "Continue with setup" (the recommended option).
  4. Wire a messaging channel (e.g. Telegram) and finish setup.
  5. Run the command setup just suggested: pnpm run chat hi.
  6. Observed: hangs 120s, then timeout: no reply in 120000ms, exit code 3. No reply.

What's actually happening under the hood

  • The CLI message does route and wake a container, but the agent has no destination pointing back at cli/local, so its reply is addressed to its only known destination (the messaging channel, e.g. Telegram). The terminal never gets a reply.
  • data/v2.db: the cli|local messaging group exists but has no row in messaging_group_agents, and the channel agent has no agent_destinations row targeting it.

Expected behavior

A user who completes setup and is told to run pnpm run chat hi should get a working terminal chat — i.e. one of:

  • setup wires cli/local to an agent on all completion paths (e.g. wire it to the channel agent, or always create the terminal agent), or
  • the pnpm run chat hi suggestion (and outro line) is only shown when a terminal agent was actually wired.

Independently, scripts/chat.ts should fail gracefully when cli/local is unwired — detect the missing wiring and print a one-line hint (how to wire it) instead of a 120s opaque timeout.

Secondary bug surfaced: delete-cli-agent.ts cleanup is not crash-safe

During my install the cleanup step (step 2 above) errored, but I want to flag why it's fragile:

In scripts/delete-cli-agent.ts the DB deletion commits first (:48-61), then the filesystem removal runs (:64-73) with no try/catch. If the _ping-test container hasn't fully released its bind-mount yet (it's deleted immediately after the ping succeeds, likely still in its SIGTERM grace period), fs.rmSync throws — which:

  • leaves an orphaned groups/_ping-test/ directory on disk (mine still contains a .claude-shared.md -> /app/CLAUDE.md symlink that only resolves inside the container — a tell it was a live mount), and
  • reports the entire cleanup step as failed, even though the DB is fully consistent.

Note this does not change the primary bug: the DB-level deletion (which removes the cli/local wiring) succeeds regardless, so cli/local ends up orphaned whether or not cleanup errors.

Suggested fix

  • Make the filesystem cleanup best-effort (wrap in try/catch — the DB is the source of truth), and/or sequence it after confirming the _ping-test container is actually down.

Workaround

Manually wire cli/local to an existing agent and give it a return destination:

# find the cli messaging group id and your agent group id
pnpm run ncl messaging-groups list
pnpm run ncl groups list

pnpm run ncl wirings create \
  --messaging-group-id <cli-mg-id> \
  --agent-group-id <agent-id> \
  --engage-mode pattern --engage-pattern . --session-mode shared

# NOTE: wirings create does NOT auto-create the destination (see #2389) —
# you must add it explicitly, or the reply is dropped / sent to the wrong channel:
pnpm run ncl destinations add \
  --agent-group-id <agent-id> \
  --local-name terminal \
  --target-type channel \
  --target-id <cli-mg-id>

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions