chore: rolling promotion dev -> main#589
Conversation
|
Important Review skippedToo many files! This PR contains 295 files, which is 145 over the limit of 150. To get a review, narrow the scope: ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: ⛔ Files ignored due to path filters (5)
📒 Files selected for processing (295)
You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request significantly expands the Omni platform's capabilities by introducing a Twilio WhatsApp channel plugin, a host-fingerprint trust system for Genie hosts, and an output-redactor middleware for data privacy. It also hardens the system with idempotency guards for NATS subscribers, transient error retries for agent dispatches, and improved date validation across API endpoints. Review feedback identified critical bugs including a missing import in the media processor and potential data loss in chats.settings and sync job progress updates due to non-merging object replacements. Furthermore, the polling mechanism for linking media to events was noted as an area for performance optimization.
| const [event] = await ctx.db | ||
| .select({ id: omniEvents.id }) | ||
| .from(omniEvents) | ||
| .where(eq(omniEvents.id, eventId)) |
| await services.chats.update(data.chatId, { | ||
| settings: { agentPaused: true }, | ||
| }); |
There was a problem hiding this comment.
Accidental data loss in chats.settings. This update replaces the entire settings JSONB object with { agentPaused: true }, clobbering any existing metadata keys such as followUpConfig or closeOutcome. Per general rules, metadata updates should merge with existing data rather than replacing the entire object.
| await services.chats.update(data.chatId, { | |
| settings: { agentPaused: true }, | |
| }); | |
| const chat = await services.chats.getById(data.chatId); | |
| await services.chats.update(data.chatId, { | |
| settings: { ...((chat?.settings as Record<string, unknown>) ?? {}), agentPaused: true }, | |
| }); |
References
- When implementing an update command for a resource that has a metadata field, the update operation should merge the new metadata with the existing metadata rather than replacing it, to prevent accidental data loss of other metadata keys.
| await services.chats.update(data.chatId, { | ||
| settings: { | ||
| ...(shouldPauseAgent ? { agentPaused: true } : {}), | ||
| closed: terminal, | ||
| closeUntil: closeUntil?.toISOString() ?? null, | ||
| closeOutcome: outcome, | ||
| } as Record<string, unknown>, | ||
| }); |
There was a problem hiding this comment.
Accidental data loss in chats.settings. Similar to the handoff path, this update clobbers the entire settings JSONB column. Existing configuration keys like followUpConfig will be lost. You must fetch the current settings and merge them before updating.
| await services.chats.update(data.chatId, { | |
| settings: { | |
| ...(shouldPauseAgent ? { agentPaused: true } : {}), | |
| closed: terminal, | |
| closeUntil: closeUntil?.toISOString() ?? null, | |
| closeOutcome: outcome, | |
| } as Record<string, unknown>, | |
| }); | |
| const chat = await services.chats.getById(data.chatId); | |
| await services.chats.update(data.chatId, { | |
| settings: { | |
| ...((chat?.settings as Record<string, unknown>) ?? {}), | |
| ...(shouldPauseAgent ? { agentPaused: true } : {}), | |
| closed: terminal, | |
| closeUntil: closeUntil?.toISOString() ?? null, | |
| closeOutcome: outcome, | |
| } as Record<string, unknown>, | |
| }); |
References
- When implementing an update command for a resource that has a metadata field, the update operation should merge the new metadata with the existing metadata rather than replacing it, to prevent accidental data loss of other metadata keys.
| async function resolveSafeMediaContentEventId( | ||
| ctx: MediaProcessorContext, | ||
| eventId: string | undefined, | ||
| ): Promise<string | null> { | ||
| if (!isUuid(eventId)) return null; | ||
|
|
||
| // media_content is audit/replay metadata, so do not block media.processed for long. | ||
| // Event persistence runs concurrently with this processor and should normally win within milliseconds. | ||
| const maxWaitMs = 250; | ||
| const pollMs = 50; | ||
| const deadline = Date.now() + maxWaitMs; | ||
|
|
||
| while (true) { | ||
| try { | ||
| const [event] = await ctx.db | ||
| .select({ id: omniEvents.id }) | ||
| .from(omniEvents) | ||
| .where(eq(omniEvents.id, eventId)) | ||
| .limit(1); | ||
|
|
||
| if (event) return event.id; | ||
| } catch (error) { | ||
| log.debug('Failed to validate media_content event FK', { eventId, error: String(error) }); | ||
| return null; | ||
| } | ||
|
|
||
| if (Date.now() >= deadline) { | ||
| log.debug('Skipping media_content event FK; omni_event not found', { eventId }); | ||
| return null; | ||
| } | ||
|
|
||
| await new Promise((resolve) => setTimeout(resolve, pollMs)); | ||
| } | ||
| } |
There was a problem hiding this comment.
The polling loop in resolveSafeMediaContentEventId is an inefficient way to handle the race condition between event persistence and media processing. While 250ms is a short window, this pattern results in multiple unnecessary database queries and holds up the event handler. Consider using a database-level constraint or a more reactive approach (e.g., a background worker or a retry mechanism in the event bus) to link media content to events.
| const update: Partial<SyncJobProgress> = { | ||
| fetched: payload.fetched ?? 0, | ||
| stored: 0, | ||
| duplicates: 0, | ||
| mediaDownloaded: 0, | ||
| totalEstimated: | ||
| payload.progress && payload.progress > 0 | ||
| ? Math.round((payload.fetched ?? 0) / (payload.progress / 100)) | ||
| : 0, | ||
| }); | ||
| }; |
There was a problem hiding this comment.
Potential counter reset bug. If payload.fetched is undefined, it defaults to 0, which will clobber the existing fetched count in the database during the updateProgress merge. The update object should only include the fetched key if the value is explicitly provided in the payload to avoid accidental data loss.
const update: Partial<SyncJobProgress> = {};
if (payload.fetched !== undefined) {
update.fetched = payload.fetched;
}References
- When implementing an update command for a resource that has a metadata field, the update operation should merge the new metadata with the existing metadata rather than replacing it, to prevent accidental data loss of other metadata keys.
|
@codex review this pr |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3e765a0b4d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let instance: string | null = instanceId; | ||
| let chat: string | null = chatId; | ||
| let recipient: string | null = null; | ||
|
|
||
| // Path-param extraction: /instances/:id, /chats/:id | ||
| const pathInstance = PATH_INSTANCE_PREFIXES.map((p) => firstPathSegment(cleanPath, p)).find((v) => v != null) ?? null; | ||
| const pathChat = PATH_CHAT_PREFIXES.map((p) => firstPathSegment(cleanPath, p)).find((v) => v != null) ?? null; | ||
| if (!instance && pathInstance) instance = pathInstance; | ||
| if (!chat && pathChat) chat = pathChat; |
There was a problem hiding this comment.
Prioritize path instance/chat targets over JSON body
The target extraction currently trusts instanceId/chatId from the request body before path params, and only falls back to path values when the body omits them. Because requireSignedInstanceMiddleware and allowlist checks both consume extractLockTargets, a caller can hit a path-scoped endpoint like PATCH /api/v2/instances/:id and inject a different instanceId in JSON to make authorization decisions against the wrong instance/chat. This can bypass require_genie_signature and allowlist enforcement for path-targeted writes.
Useful? React with 👍 / 👎.
| agentId: instance.agentId ?? null, | ||
| outcome, | ||
| reason: data.reason ?? null, | ||
| escalated: terminal, |
There was a problem hiding this comment.
Publish actual escalation flag in chat.closed payload
The emitted chat.closed event sets escalated to terminal instead of the computed escalated value, so hard-terminal outcomes (won/lost) are always reported as escalations even when no threshold-based auto-promotion happened. This corrupts downstream audit/BI semantics and any automation that distinguishes explicit hard closes from repeated soft-close escalation.
Useful? React with 👍 / 👎.
…serve feat(api,doctor): deprecate embedded pgserve mode (phase 1 — warn, no behavior change)
…d becomes opt-in)
Phase 2 of the embedded-pgserve removal plan from omni#595. The
deprecation warning has been live since 2026-05-01; this PR completes
the default flip:
Before phase 2: After phase 2:
────────────── ──────────────
PGSERVE_EMBEDDED unset PGSERVE_EMBEDDED unset
→ embedded ON (legacy) → embedded OFF (canonical)
useCanonicalPgserve undefined useCanonicalPgserve undefined
→ embedded ON → canonical (NEW DEFAULT)
Embedded mode is now an explicit opt-in. Operators who genuinely need
embedded — for whatever reason (unable to install pgserve globally, hitting
container-without-supervisord scenarios, transitional rollback, etc.) —
must explicitly set `useCanonicalPgserve: false` in `~/.omni/config.json`
(or `PGSERVE_EMBEDDED=true` if launching omni-api directly without the CLI).
What this PR changes
--------------------
1. `packages/cli/src/runtime-env.ts::buildRuntimeEnv`
- Logic flipped: `PGSERVE_EMBEDDED='true'` only when
`serverConfig.useCanonicalPgserve === false` (explicit opt-out).
undefined OR `true` both produce `PGSERVE_EMBEDDED='false'`.
- Updated docstring with the full state table.
2. `packages/api/src/pgserve.ts::resolvePgserveConfig`
- Defensive flip for direct omni-api invocations bypassing the CLI:
`enabled = process.env.PGSERVE_EMBEDDED === 'true'` (was
`(process.env.PGSERVE_EMBEDDED ?? 'true') === 'true'`).
- Updated module docstring's PGSERVE_EMBEDDED explanation with a
⚠️ DEPRECATED marker and the new opt-in semantics.
Compatibility — who breaks
--------------------------
Operators on legacy embedded who:
- Have already migrated via `omni doctor --fix` → no impact (they
have `useCanonicalPgserve: true` persisted).
- Have NOT migrated AND have canonical pgserve installed via
`pgserve install` → omni-api auto-discovers it, no impact.
- Have NOT migrated AND have NO canonical pgserve installed → omni-api
boots, can't connect to DATABASE_URL, fails-loud at health check.
Operator runs `omni doctor` → sees DEPRECATED WARN with one-line fix
→ runs `omni doctor --fix` → migrated.
Phase 1's deprecation warning has been visible in pm2 logs on every
restart for these operators since 2026-05-01. The 2026-05-01 doctor
WARN already pointed at `omni doctor --fix`. Net: this is the first
breaking change, but it lands behind a 2-week+ deprecation window
with clear remediation surface at every step.
Tests
-----
- `packages/cli/src/__tests__/runtime-env.test.ts` (+3 tests):
* Renamed primary test from "embedded mode" to "canonical default".
* Asserts PGSERVE_EMBEDDED='false' for undefined useCanonicalPgserve.
* Locks the explicit opt-out path: useCanonicalPgserve=false → embedded.
* Locks the explicit opt-in path: useCanonicalPgserve=true → canonical.
- `packages/api/src/__tests__/pgserve.test.ts` (+3 tests):
* Locks default-OFF for `PGSERVE_EMBEDDED` when env is unset.
* Locks "exactly 'true'" semantics — anything else (including 'false',
'0', 'TRUE', 'yes') maps to OFF.
* Locks the canonical opt-in path.
All quality gates pass:
- bun run typecheck: green
- packages/cli runtime-env.test.ts: 14/14 ✓
- packages/cli doctor.test.ts: 39/39 ✓
- packages/api pgserve.test.ts: 40/40 ✓
- bunx biome check on changed files: clean
Phase 3 (future)
----------------
Once fleet adoption is verified (zero `useCanonicalPgserve: false` hosts
in operator surveys / log signal), a follow-up PR removes:
- `startEmbeddedPgserve` + `stopEmbeddedPgserve` + libicu shim + cwd-drift
guards (~800 LOC in packages/api/src/pgserve.ts)
- `pgserve` runtime npm dep from packages/api/package.json
- `useCanonicalPgserve` config field (becomes meaningless)
- `PGSERVE_EMBEDDED` env var consumer
…ault-phase-2 feat(api,doctor): deprecate embedded pgserve mode (phase 1 — warn, no behavior change)
Halts `omni update` BEFORE installing a version that would cross the
phase-2 default flip (omni#596) on a host that hasn't migrated and has
no canonical pgserve binary installed. Operators get a clear remediation
path instead of discovering a boot failure hours later when omni-api
restarts under the new code.
Why this matters
----------------
Phase 2 (omni#596, 2026-05-02) flipped the embedded-pgserve default from
ON to OFF. Operators with `useCanonicalPgserve: undefined` (legacy who
never migrated) are silently switched to canonical-mode at the next
omni-api restart. If they don't have `pgserve` on PATH, omni-api boots
with `PGSERVE_EMBEDDED=false` and fails to connect. The deprecation
banner from omni#595 has been visible since 2026-05-01, but operators
who skim logs or who upgraded across a long downtime would still get
caught by surprise.
This pre-flight catches that case at the moment they're most likely to
break things — `omni update` itself — and refuses to proceed with three
remediation paths surfaced inline:
(recommended) Migrate to canonical pgserve:
bun add -g pgserve@^2.1.0
omni doctor --fix # automated pg_dump → install → restore
omni update # then re-run
(transitional) Pin embedded explicitly:
omni config set server.useCanonicalPgserve 'false'
omni update # then re-run
Bypass (NOT recommended): omni update --skip-canonical-preflight
Decision logic (pure function — no I/O — easy to test)
------------------------------------------------------
Halt only when ALL FIVE hold:
1. currentVersion < 2.260502 (operator hasn't crossed phase 2 yet)
2. targetVersion >= 2.260502 (the upgrade would cross phase 2)
3. useCanonicalPgserve is undefined (legacy never migrated)
- explicit `true` → migrated, safe
- explicit `false` → opted-into-embedded, safe
4. pgserve binary is NOT on PATH
5. operator did not pass --skip-canonical-preflight
Defensive: malformed version strings (e.g., test fixtures) default to
"safe to proceed" rather than blocking legitimate upgrades on parse
failures.
Tests
-----
- `packages/cli/src/__tests__/update-canonical-preflight.test.ts` (new)
10 cases covering both safe-to-proceed paths AND the dangerous path,
including:
* cutoff boundary (>= not >, day-before-cutoff doesn't trigger)
* malformed version (defensive null)
* all four combinations of useCanonicalPgserve × pgserveOnPath
* error message contains all three remediation paths
Quality gates
-------------
- bun run typecheck — green
- bun test update-canonical-preflight.test.ts — 10/10 ✓
- bun test update-verify.test.ts — 16/16 ✓ (existing, unchanged)
- bunx biome check — clean (after auto-format)
Where this fits
---------------
Phase 1 (omni#595): visible deprecation warning on every embedded boot.
Phase 2 (omni#596): default flip — canonical is now the unspecified default.
Phase 2.5 (THIS PR): upgrade-time guard so legacy hosts aren't caught
mid-upgrade without canonical pgserve.
Phase 3 (future): delete embedded code (~800 LOC) + drop pgserve dep
from packages/api.
Phase 2.5 is non-breaking by design — it only triggers when the operator
is about to make a breaking change. Canonical hosts (this server,
post-`omni doctor --fix` hosts, hosts with bun-global pgserve installed)
see no impact.
…rve-preflight feat(api,doctor): deprecate embedded pgserve mode (phase 1 — warn, no behavior change)
Mirrors genie's `offer_claude_plugin()` pattern (genie/install.sh:563-591)
so a fresh `curl … | bash` end-to-end install gives the operator both the
omni CLI AND the `/omni:*` slash commands without a separate manual step.
Closes Group 3 of the omni-genie-onboarding wish (the only remaining
group; Groups 1, 2, 4, 5 already shipped on genie dev).
Behavior
- New `offer_omni_plugin()` function detects `claude` on PATH.
- If present: `claude plugin marketplace add automagik-dev/omni` (no-op when
already registered) followed by `claude plugin install omni@automagik-dev`.
- If absent: graceful skip with an actionable hint pointing at the manual
install command. Never fails the overall install — the CLI is functional
without the plugin; the plugin is a DX upgrade, not a hard requirement.
- Wired into both `install_cli_only` (after `bun add -g` succeeds) and
`install_full_server` (after `omni install --non-interactive` finishes).
Why mirror genie's pattern
The genie plugin install function is the proven shape: marketplace add +
install, idempotent, never blocking. Same shape used here keeps the two
installers' DX consistent for operators who run both.
Marketplace metadata
`packages/.claude-plugin/marketplace.json` already declares the plugin:
```json
{
"name": "automagik-dev",
"plugins": [{ "name": "omni", ... }]
}
```
So `claude plugin install omni@automagik-dev` resolves to the right thing
without any marketplace.json changes.
Validation
- Wish acceptance grep: `grep -q "omni.*plugin\|plugin.*omni" install.sh` → PASS
- `bash -n install.sh` → clean (no syntax errors)
- Smoke test (function-only, isolated): both paths exit 0 with expected
output (claude-present → "OK installed", claude-missing → "skipping").
- Side effect of the smoke test: omni plugin installed cleanly on the test
host (`claude plugin list` shows `omni@automagik-dev`).
Non-goals
This change does NOT touch:
- The plugin SessionStart hook channel logic (separate concern, fixed in
aeab63d).
- Local-mode plugin symlinking (omni install.sh is npm-only — there's no
LOCAL_PATH equivalent, so the genie symlink branch doesn't apply here).
…lse-stale
ROOT CAUSE
==========
PostgreSQL's `timestamp without time zone` strips the offset on write.
When `defaultNow()` (PG NOW()) ran in a session-TZ database (canonical
pgserve uses America/Sao_Paulo), `started_at` was stored as a literal
BRT wall-clock value. When `new Date()` from JS wrote `closed_at`, the
UTC ISO string had its offset stripped and stored as a literal UTC
wall-clock value. The two columns then differed by 3h (the BRT offset)
even when the real interval was a few seconds.
This corrupted the turn-monitor's `getStale()` logic: every fresh turn
appeared to have a `last_activity_at` 3h in the past, so the monitor
force-closed each new turn within seconds — agents never had a chance
to respond to inbound DMs.
EVIDENCE (live DB before fix)
=============================
SELECT id, EXTRACT(EPOCH FROM (closed_at - started_at)) AS diff_sec
FROM turns ORDER BY started_at DESC LIMIT 3;
=> diff_sec = 10804.88, 10804.01, 10808.88 (3h offset, not real duration)
EVIDENCE (live DB after fix)
============================
SELECT id, EXTRACT(EPOCH FROM (closed_at - started_at)) AS diff_sec
FROM turns ORDER BY started_at DESC LIMIT 3;
=> diff_sec = 4.88, 4.01, 8.88 (real turn durations)
CHANGES
=======
- packages/db/src/schema.ts: 99 timestamp() calls now pass
{ withTimezone: true }, matching the 10 columns that were already
TZ-aware. Zero TZ-naive timestamps remain in the schema.
- packages/db/drizzle/0036_timestamptz_migration.sql: ALTER TABLE
for every TZ-naive column (99 total). USING clause is per column:
* defaultNow() columns -> AT TIME ZONE current_setting('TimeZone')
(re-interpret literal as session TZ, where PG NOW() wrote it)
* nullable JS-set columns -> AT TIME ZONE 'UTC'
(re-interpret literal as UTC, where new Date() wrote it)
- packages/db/drizzle/meta/_journal.json: register 0036.
- packages/db/scripts/check-tz-naive.ts + package.json: build-time guard
that fails typecheck if any new TZ-naive timestamp() is introduced.
Wired into `bun run typecheck` and exposed as `bun run lint:tz`.
CONSTRAINTS HONORED
===================
- No ALTER DATABASE timezone — fix works regardless of system TZ.
- Root-cause fix at the schema layer, not a turn-monitor workaround.
- All 22 tables with TZ-naive timestamps converted, not just turns.
VALIDATION
==========
- packages/db: typecheck + lint:tz pass
- bun test: 3743 pass / 0 fail / 285 skip (no regressions)
- Live migration apply: 37/37 migrations applied, all `turns` columns
now `timestamp with time zone`, fresh inserts show diff_sec = 0.0.
feat(install): offer Claude Code plugin install in both install paths
fix(db): timestamptz everywhere — kills turn-monitor false-stale (3h offset)
The original consumer-side stale-event gate caught events for chats whose follow-up rows had been disarmed (sequence_complete, customer_replied, handoff, etc.) or closed via close-contact. But it let through replayed events for chats that were *still nominally active* — i.e. row.disarmReason IS NULL — even when the row had already advanced past the event's sequenceIndex via a later sweeper tick. Repro from production (a production tenant 22:10 BRT, ~7h post the first iteration of this fix): a single chat got 12 follow-ups in 6 hours despite maxFollowUps=3, with row.sequenceIndex stuck at 2 and the row's last update timestamp 2h before the most recent fires — classic replay pattern. Per-session-repeats <60s briefly hit the burst guard (two pairs gapped 30s each). Why the original gate missed: a row at sequenceIndex=2 with disarmReason=NULL is "active" by the existing check, but the replayed events carry the OLDER sequenceIndex=0 or =1 in their payload — the sweeper has already published those once and recordFired-advanced the row. Without comparing event vs row sequenceIndex, the gate accepts the replay. This adds the comparison: - engine extracts payload.sequenceIndex and passes it to the gate - evaluateIdleTimeoutFreshness reads chatFollowUpState.sequenceIndex and skips with reason `sequence_advanced_row_at_N_event_M` when row.sequenceIndex > eventSequenceIndex - All callsite signatures updated to pass through the new arg Validated post-deploy: skip log shows `reason":"sequence_advanced_row_at_2_event_1"` and `sequence_advanced_row_at_1_event_0"` entries; 0 fires to the previously-leaking chats; 0 per-session repeats <60s. Refs: namastexlabs/genie-a production tenant follows up commit d7d1a82.
…rd-sequence-index fix(automations): also gate replayed events by sequenceIndex
Rebases the 2026-05-03 draft against current dev state (post ddebb05 "feat(update): canonical-pgserve phase-2 cutoff pre-flight guard"). Changes: - Acknowledge ddebb05 as Stage 2.5 of the shared 14-stage pipeline (preflight runs before checkLatestVersion); wish does not modify it - install.sh audit: 593 LOC on dev (not 540 as pre-rebase snapshot assumed); target ≤80 LOC; gap = 513 LOC to remove - Decision #8 added: rename decideUpdateVerify -> decideVerify with pointer-equal alias retained for external consumers - Decision #9 added: phase-2 preflight is recognized as Stage 2.5 - Decision #10 added: install.sh slim ships in the same PR as the omni install --non-interactive wire (avoids transitional state) - Cross-wish dependency reframed: paired-with genie#update-unify-stages (parallel ship, neither blocks the other), not depends-on. The original reference to "automagik-dev/genie #update-unify-stages" pointed at a wish that did not exist — that wish is now opening in parallel - pgserve#autopg-upgrade-command flagged as not-paired-with — different domain (DB lifecycle, not installer UX); shipped 2026-05-03 via 466d1a4 No code changes; doc-only.
…ebased docs(wish): rebase update-unify-stages against origin/dev
Authors the omni sibling of automagik-dev/genie#output-primitives-unified. The two wishes ship in parallel; omni's existing output.ts is the byte-reference that genie absorbs. Audit findings (origin/dev, 2026-05-04): - 54 command files in packages/cli/src/commands/ - 47 already use output.* helpers (centralized pattern in production) - 4 outliers with direct chalk/ora imports: * update.ts — 5 chalk + 5 ora call sites * install.ts — 3 ora * providers-setup.ts — 1 ora * film.ts — 1 ora - output.ts itself: 322 LOC, 11 baseline exports, JSON+human modes, NO_COLOR aware, pipe-flush-aware (handles 64KB Linux pipe truncation) - Missing: step / spinner / banner / progress / divider primitives Scope (4 groups, 3 waves): - G1: extend output.ts with step/spinner/banner/progress/divider per SHARED-DESIGN.md §3; +deps (boxen, cli-progress) - G2: migrate the 4 outlier files; remove all direct chalk/ora imports from packages/cli/src/commands/ - G3: tests + CHANGELOG; lock TTY/non-TTY/JSON-mode/NO_COLOR paths - G4: lint rule banning chalk/ora imports outside output.ts Estimated 1-2 engineer-days. Pure additive — zero behavior change to the existing 16 baseline exports; existing 47 callers untouched. Independent implementations per SHARED-DESIGN.md decision #1. Genie sibling adopts THIS file as byte-reference.
…fied docs(wish): draft output-primitives-unified — extend output.ts surface
…s registry Groups 1+2+3 of update-unify-stages wish. Orchestrator co-commit — team unify-omni produced these changes during the 2026-05-05 pgserve outage but couldn't push because mailbox FK violations blocked `genie done` / `genie send` afterwards. What's in this commit: - G1 (commands/update.ts): adopt shared VerifyResult shape, rename decideUpdateVerify → decideVerify with deprecated alias, add `skipped` variant. Existing locked error strings preserved byte-identically. - G2 (legacy-cleanup.ts NEW, 214 LOC): LegacyArtifact registry with `nats-reply-sidecar` day-one entry wrapping existing cleanupSidecars / formatCleanupSummary helpers. Operator output byte-identical to the pre-registry behavior. - G3 (commands/doctor.ts): post-update maintenance hook consumed by update.ts. - Tests: legacy-cleanup.test.ts (new) + update-verify.test.ts updates. Groups 4-5 (diagnostics JSON + install.sh slim + npm-global detection + --no-verify flag) still pending — team continues from here next session.
…y + slim install.sh Completes the omni half of the update-unify-stages wish (Groups 3/4/5). Pairs with the genie-side parallel implementation (independent code, identical public shape per SHARED-DESIGN.md decision #3). Group 3 — Post-update maintenance hook: - runPostUpdateMaintenance() calls runDoctor({ json: true, dryRun: true }) after a successful restart + verify; non-blocking by contract. - resolveMaintenanceSkipReason() pure function with verify-failed > cli-flag > env precedence; --skip-maintenance flag and OMNI_UPDATE_SKIP_MAINTENANCE env var. - One-line "Maintenance: <ok> ok, <warn> warn, <fail> fail" summary. - 16 tests in update-maintenance.test.ts using runDoctorImpl injection (not mock.module) to avoid bun test cross-file pollution. Group 4 — Diagnostics JSON capture: - New update-diagnostics.ts: UpdateDiagnostics shape + writer. - schemaVersion: 1 (intentionally asymmetric with genie's 2). - Every omni update writes ~/.omni/logs/update-diagnostics-<iso>.json capturing install attempt, registry probe, restart, verify, cleanups, maintenance, parallel-install detection, and a tail of pm2 log signals. - finalize() wraps process.exit so diagnostics is persisted on every termination path (success, failure, cancel). - Best-effort: write failures never change the update exit code. - Verbose path printed when OMNI_UPDATE_DIAGNOSTICS_VERBOSE=1. - 14 tests in update-diagnostics.test.ts. Group 5a — Parallel npm-global install detection: - detectParallelNpmGlobalInstall() probes `npm root -g` for a stale @automagik/omni install (omni installs via bun-global; npm-global shadows the bun binary on PATH). - One-line warning early in update with the offending path and `npm uninstall -g @automagik/omni` recommendation. - 6 tests in update-parallel-install.test.ts via injected npmRoot/exists. Group 5b — --no-verify flag: - Restart services but skip the post-restart probe (e.g. when a release ships a broken /api/v2/health). Distinct from --no-restart. - decideVerify({ skipReason: 'no-verify-flag' }) → skipped variant. - Yellow "Server: v… (skipped)" / "Auth: skipped" banner. - Maintenance auto-skips because verify never confirmed health. Group 5c — Slim install.sh from 593 to 77 LOC: - Bootstrap-only: ensure_bun → resolve channel from ~/.omni/config.json → bun add -g @automagik/omni@<channel> → omni install --non-interactive. - All wizard prompts (instance / AI provider / output format / etc.) delegated to the TS installer. Future wizard work belongs in TS. - --cli skips the server bootstrap; --server (default) runs it; --help prints usage. Old flags --cli/--server/--help still accepted. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
fix(homolog): propagate KHAL session ids through Gupshup dispatch
Merge homolog version-bump automation fix after green CI.
Merge release-notes wrapper hardening after green CI.
…-spans-20260531T215628Z feat(observability): add Omni curated lifecycle spans
…-spans-20260531T215628Z fix(api): propagate active trace context for lifecycle waterfall
…-target-dev fix(whatsapp): resolve reaction UUID targets
…p-identities-dev fix(persons): search WhatsApp participant identities
fix(api): repair voice notes and Veo requests
Co-authored-by: Drogo Hermes <genie@namastex.ai>
Co-authored-by: Drogo Hermes <genie@namastex.ai>
Set Omni agent timeout defaults and UI bounds to 600 seconds; verified locally and CI green.
Rolling Promotion PR
Auto-maintained rolling promotion PR from
devtomain.Process:
ready-to-mergeadded when all checks pass