Skip to content

Dev#2491

Merged
namastex888 merged 59 commits into
mainfrom
dev
Jun 6, 2026
Merged

Dev#2491
namastex888 merged 59 commits into
mainfrom
dev

Conversation

@namastex888

@namastex888 namastex888 commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

  • New Features

    • Added standalone brain status reporting showing installed but not running state.
    • Enhanced Claude session handling with automatic fallback to fresh session if resume fails.
    • Introduced agent/executor lifecycle error health monitoring flag.
  • Bug Fixes

    • Improved test database setup to fail early with clear error messages.
    • Enhanced legacy pgserve discovery and cleanup during migrations.
  • Tests

    • Added comprehensive test suites for tmux spawning, health monitoring, and migration logic.
  • Chores

    • Version bump to 4.260606.2.
    • Refactored shared utilities and updated test metrics data.

automagik-genie and others added 30 commits May 22, 2026 21:29
fix: report standalone brain install in serve status
Adds agent_error observe health flag for agent/executor error lifecycle rows and regression coverage.
Co-authored-by: Genie Automagik <genie@namastex.ai>
Daily metrics update for 2026-05-24.
Stats: 47 commits, 0 releases this week, +1319/-339 LoC (7d).
VELOCITY.md and charts refreshed.

https://claude.ai/code/session_01993o1C5JAog4AzSD2WtFu3
…rdcoding 8432

Migration 002 hardcoded CANONICAL_PORT=8432 and killed any postmaster on a
different port. After the autopg-v3 socket-singleton cutover the canonical
backbone binds 5432, so this was (a) inert on healthy 5432 hosts and, worse,
(b) in a mixed window with a stray legacy postmaster on 8432 alongside
canonical 5432, it treated 8432 as canonical and stopped the REAL 5432 backbone.

Now discovers the canonical port from the autopg/pgserve binary
(`<bin> status --json`). When no canonical binary is installed or it can't
report a port, the migration is a safe no-op — we never stop a postmaster we
can't positively distinguish from canonical. Extracted selectLegacyEmbedded()
as a pure, unit-tested helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Reuse extractPgservePortFromStatus (moved to src/lib/pgserve-status.ts,
  re-exported from update.ts) so the migration tolerates nested
  instance.port / runtime.port status shapes — reading only top-level .port
  would null-resolve and make the migration a permanent no-op on those hosts
  (Codex P1).
- Guard the parsed port with > 0 so Number(null|false|'')===0 is rejected
  (Gemini).
- selectLegacyEmbedded now returns ALL non-canonical postmasters (filter, not
  find); apply stops each and validate waits for all to exit — otherwise a
  second stray survives and validate fails after 5s (Gemini).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…al-port-discovery

fix(migrations): 002 discovers canonical pgserve port instead of hardcoding 8432
- releases_24h: 2 (v4.260525.1, v4.260525.2)
- merged_prs_7d: 30
- avg_merge_time: 1.3h
- ship_rate: 100%
- daily_stats_count: 61

https://claude.ai/code/session_01A9HUwDC1aLZ45z8N5ViygD
Claude Code fails when asked to --resume a session whose JSONL file is
missing (e.g. after a cleanup or on a fresh machine). By rewriting the
flag to --session-id we keep the same identifier but force a fresh
session, which always succeeds.
buildOmniSpawnParams now always emits sessionId (never resume), so
buildLaunchCommand produces --session-id <id>. Unlike --resume, this
flag attaches to an existing JSONL transcript when present but gracefully
starts a fresh session with the same id when the transcript is missing
(e.g. after cleanup or on a fresh machine) — preventing hard failures on
respawn. Fix is applied at the source (where the command is built), not
in the transport layer (tmux-launch-script), which is now provider-agnostic.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When respawning a per-chat agent with a prior Claude session id, emit
--resume (not --session-id) so Claude reattaches to the existing JSONL.
Add a 3s liveness check after launch: if --resume silently fails (JSONL
missing), fall back to a fresh --session-id so the inbound message is
not lost.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
claude and others added 24 commits May 30, 2026 12:12
fix(omni): use --resume for respawn with JSONL-missing fallback
@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 64441388-c163-47ad-950f-c3c5be7ab3f2

📥 Commits

Reviewing files that changed from the base of the PR and between a653f5d and dd04a3a.

⛔ Files ignored due to path filters (5)
  • .genie/assets/commits-30d.svg is excluded by !**/*.svg
  • .genie/assets/loc-30d.svg is excluded by !**/*.svg
  • .genie/assets/releases-30d.svg is excluded by !**/*.svg
  • README.md is excluded by !*.md
  • VELOCITY.md is excluded by !*.md
📒 Files selected for processing (26)
  • .claude-plugin/marketplace.json
  • .genie/agents/metrics-updater/daily-stats.jsonl
  • .genie/agents/metrics-updater/runs.jsonl
  • .genie/agents/metrics-updater/state.json
  • package.json
  • plugins/genie/.claude-plugin/plugin.json
  • plugins/genie/package.json
  • scripts/tests/omni-spawn-smoke.ts
  • src/genie-commands/update.ts
  • src/lib/__tests__/tmux-launch-script.test.ts
  • src/lib/agent-observability.test.ts
  • src/lib/agent-observability.ts
  • src/lib/isolation-guard.test.ts
  • src/lib/pgserve-status.ts
  • src/lib/test-db.ts
  • src/lib/tmux-launch-script.ts
  • src/lib/tmux.ts
  • src/migrations/steps/002-kill-embedded-pgserve-legacy.ts
  • src/services/executors/claude-code.test.ts
  • src/services/executors/claude-code.ts
  • src/term-commands/agent/observe.ts
  • src/term-commands/agents.ts
  • src/term-commands/observe.ts
  • src/term-commands/serve.test.ts
  • src/term-commands/serve.ts
  • test/migrations/002-kill-embedded-pgserve-legacy.test.ts

📝 Walkthrough

Walkthrough

This PR bundles release version metadata updates (4.260606.2), metrics data snapshots, and functional enhancements: shared utilities for pgserve status parsing and tmux script generation; hardened test database setup with fail-closed behavior; multi-target legacy postgres cleanup via runtime port discovery; agent terminal error health signaling; and standalone brain status with dependency injection.

Changes

Release artifacts, infrastructure libraries, and robustness hardening

Layer / File(s) Summary
Release version metadata and metrics snapshots
.claude-plugin/marketplace.json, package.json, plugins/genie/.claude-plugin/plugin.json, plugins/genie/package.json, .genie/agents/metrics-updater/daily-stats.jsonl, .genie/agents/metrics-updater/runs.jsonl, .genie/agents/metrics-updater/state.json
Package and plugin manifests bumped to 4.260606.2. Metrics updater JSONL files replaced with records covering May 22–June 6, with updated daily metrics and 7-day aggregate fields.
Shared pgserve status port extraction
src/lib/pgserve-status.ts, src/genie-commands/update.ts
New extractPgservePortFromStatus utility parses JSON status output and extracts ports from multiple shapes (top-level, instance.port, runtime.port). Update command refactored to import and re-export this shared implementation, removing inline JSON parsing.
Shared tmux launch script library with unit tests and CLI integration
src/lib/tmux-launch-script.ts, src/lib/__tests__/tmux-launch-script.test.ts, src/term-commands/agents.ts
New module creates executable shell scripts under ~/.genie/spawn-scripts to avoid tmux send-keys escaping issues. Sanitizes worker IDs, writes shebangs, sets permissions. Tests verify shebang, sanitization, path location, permissions, and complex command preservation. Agents CLI refactored to use shared implementation.
Legacy embedded postgres discovery and multi-target cleanup
src/migrations/steps/002-kill-embedded-pgserve-legacy.ts, test/migrations/002-kill-embedded-pgserve-legacy.test.ts
Migration now resolves canonical port at runtime via autopg binary status JSON. New selectLegacyEmbedded returns all non-canonical listeners. Apply phase iterates targets calling pg_ctl stop with SIGTERM fallback and detailed logging. Validate waits for zero remaining legacy listeners and enumerates failures. Tests cover canonical/stray discrimination and null-port handling.
Test database fail-closed setup enforcement
src/lib/test-db.ts, src/lib/isolation-guard.test.ts
setupTestDatabase now throws descriptive errors on ensurePgserve or createTestDatabase failure instead of silent no-op cleanup, preventing destructive operations against live DB. Isolation guard test asserts fail-closed patterns.
Omni pane launch with script-sourced commands and resume fallback
src/services/executors/claude-code.ts, src/services/executors/claude-code.test.ts
Executor now generates tmux launch scripts via writeTmuxLaunchScript and sources them in the pane. Resume behavior waits, polls for provider process, and falls back to fresh session if --resume silently fails (missing JSONL). Documentation clarifies fallback path. Test comments updated for operator invariants.
Tmux pane process detection and script-vs-inline spawn smoke test
src/lib/tmux.ts, scripts/tests/omni-spawn-smoke.ts
isPaneProcessRunning enhanced with ps -o comm= for grandchild names. New Bun smoke test validates inline send-keys vs script-sourced paths with complex payloads (backticks, emojis, nested quotes), confirming parse error detection and marker presence.
Agent terminal error health flag observability
src/lib/agent-observability.ts, src/lib/agent-observability.test.ts, src/term-commands/agent/observe.ts, src/term-commands/observe.ts
New agent_error health flag signals when agent/executor lifecycle is error state. assessHealth detects via row.agentState === 'error' or row.executorState === 'error'. Term-command outputs include label mappings for both formats.
Standalone brain status dependency injection and "installed" detection
src/term-commands/serve.ts, src/term-commands/serve.test.ts
printStandaloneBrainStatus exported and refactored to accept optional config reader, fetch, version probe, and loggers. Now detects "installed but not running" by probing brain --version when no active port found. Test validates installed-not-running message path.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • automagik-dev/genie#2435: Updates genie version metadata across the same manifest files (.claude-plugin/marketplace.json, package.json, plugin manifests), bumping to a different release version.
  • automagik-dev/genie#2473: Updates genie version metadata in .claude-plugin/marketplace.json and package.json plugin manifests for release versioning.
  • automagik-dev/genie#1087: Overlaps on metrics-updater persisted data (.genie/agents/metrics-updater/ JSONL and state files) and release version metadata updates.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a script-based path for complex tmux spawns to prevent command corruption, improves test database isolation by failing closed on setup failures, refactors migration 002 to dynamically discover the canonical pgserve port, and adds an agent_error health flag to agent observability. Feedback on these changes suggests optimizing the 3-second delay during session resumption by polling the process status, using args instead of comm in isPaneProcessRunning to ensure correct process detection on macOS for interpreted runtimes, and adding a timeout to the health check fetch in printStandaloneBrainStatus to prevent potential hangs.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +268 to +270
await new Promise((r) => setTimeout(r, 3000));
const processName = resolveOmniPaneProcessName(entry.provider);
const resumed = await isPaneProcessRunning(paneId, processName);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using an unconditional 3000ms delay on every resume introduces a significant latency penalty on the happy path (which is the most common case).

Instead of waiting for the full 3 seconds unconditionally, we can poll isPaneProcessRunning at short intervals (e.g., every 500ms) and return early as soon as the process is detected as running. This keeps the resume flow extremely fast on success while still maintaining the 3-second fallback safety window on failure.

      const processName = resolveOmniPaneProcessName(entry.provider);
      let resumed = false;
      for (let i = 0; i < 6; i++) {
        resumed = await isPaneProcessRunning(paneId, processName);
        if (resumed) break;
        await new Promise((r) => setTimeout(r, 500));
      }

Comment thread src/lib/tmux.ts
// Check direct children and grandchildren for the target process name
const output = exec(
`pgrep -la -P ${panePid} 2>/dev/null; for cpid in $(pgrep -P ${panePid} 2>/dev/null); do pgrep -la -P "$cpid" 2>/dev/null; done; true`,
`pgrep -la -P ${panePid} 2>/dev/null; for cpid in $(pgrep -P ${panePid} 2>/dev/null); do pgrep -la -P "$cpid" 2>/dev/null; ps -p "$cpid" -o comm= 2>/dev/null; done; true`,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

On macOS, pgrep -la is not supported (as -a is invalid), so the command relies on ps -p "$cpid" -o comm= to get the process name. However, if the process is running under an interpreter (such as node or bun), comm will only return the interpreter name (e.g., node or bun), causing the processName check (e.g., claude or genie) to fail.

Using args (or command) instead of comm will return the full command line including arguments, ensuring correct detection of interpreted processes on macOS.

Suggested change
`pgrep -la -P ${panePid} 2>/dev/null; for cpid in $(pgrep -P ${panePid} 2>/dev/null); do pgrep -la -P "$cpid" 2>/dev/null; ps -p "$cpid" -o comm= 2>/dev/null; done; true`,
`pgrep -la -P ${panePid} 2>/dev/null; for cpid in $(pgrep -P ${panePid} 2>/dev/null); do pgrep -la -P "$cpid" 2>/dev/null; ps -p "$cpid" -o args= 2>/dev/null; done; true`,

}
try {
const fetchHealth = deps.fetchImpl ?? fetch;
const resp = await fetchHealth(`http://127.0.0.1:${config.port}/healthz`);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If the brain server is unresponsive or deadlocked but the port remains open, a fetch request without a timeout can hang indefinitely. This would cause the genie serve status command to block forever.

Adding a short timeout (e.g., 3 seconds) using AbortSignal.timeout prevents potential hangs and ensures the status command remains responsive.

    const resp = await fetchHealth(`http://127.0.0.1:${config.port}/healthz`, {
      signal: AbortSignal.timeout(3000),
    });
References
  1. It is acceptable to use hardcoded numeric limits (magic numbers) in non-critical fallback logic, especially when they serve as intentional caps to prevent performance issues like excessive I/O.

@namastex888 namastex888 merged commit 5c8a99a into main Jun 6, 2026
19 of 20 checks passed

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99b2bf40a0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

.join(' ');
const cmd = envPrefix ? `${envPrefix} ${launch.command}` : launch.command;
const scriptPath = writeTmuxLaunchScript(`omni-${chatId}`, cmd);
await executeTmux(`send-keys -t '${paneId}' "source ${scriptPath}" Enter`);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Quote the sourced launch script path

On hosts where the home directory contains spaces or shell metacharacters, this sends the pane a command like source /Users/Jane Doe/.genie/..., so the pane shell splits the path and fails to source the script, preventing omni spawn/resume from launching. The native tmux path quotes the generated script path before execution; this source path needs the same protection when building the command sent to tmux.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants