Summary
Upgrading an embedded-Postgres install from @latest (main, 2.260410.1) to @next (dev, 2.260617.3) leaves omni-api in a permanent crash loop, and the documented remediation — omni doctor --fix (embedded → canonical pgserve migration) — cannot complete on a clean host. Data is not lost, but the service is down with no working automated path forward.
Found via an isolated sandbox upgrade test (rootful Podman, ubuntu:24.04, unprivileged user, real installer flow). The production VM (already on canonical autopg-server@2.6.10) is not an embedded install and is likely unaffected; this bites embedded/older installs upgrading across the backbone change.
Environment
- Clean
ubuntu:24.04, install via the published flow: bun add -g @automagik/omni@<channel> → omni install --non-interactive → omni start.
- main
2.260410.1 installs an embedded, in-process Postgres (PG18) on :8432.
- dev
2.260617.3 expects a standalone canonical pgserve/autopg on :5432.
Repro
- Install + boot main (
@latest): healthy — embedded PG18 on :8432, 18 migrations, NATS connected. ✅
bun add -g @automagik/omni@next then omni stop && omni start.
omni-api crash-loops: ERROR api:startup "Failed to start API server" error="Database not ready after 30 attempts". omni start reports services "online" while the API silently restarts.
omni doctor correctly flags it: pgserve-canonical … embedded … DEPRECATED. Run omni doctor --fix (idempotent; pg_dump → pgserve install → restore → relaunch).
omni doctor --fix fails and cannot recover (details below).
Root causes (each independently blocks the migration)
1. Opaque failure on the bare upgrade
After upgrade, omni-api only logs Database not ready after 30 attempts. Nothing tells the operator the DB-backbone model changed (embedded → canonical) or to run omni doctor --fix. omni start should fail-fast on a deprecated-embedded + dev-binary combination with an actionable hint, not enter a silent restart loop.
2. No pg_dump ships with any bundled Postgres distribution
omni doctor --fix needs a PG18 pg_dump, but:
@embedded-postgres bundles only initdb / pg_ctl / postgres — no pg_dump/pg_restore/psql.
- Canonical
autopg/pgserve (~/.autopg/bin/...) — same: only initdb / pg_ctl / postgres.
find / -name pg_dump → empty on a fresh install.
3. The remediation hint installs the wrong major version
doctor --fix suggests apt install postgresql-client. On Ubuntu 24.04 that's PG16, which refuses to dump a PG18 server (pg_dump: server version 18 … is newer). The hint is actively misleading; it should point at a PG≥server client (e.g. PGDG postgresql-client-18) or use a bundled one.
4. Ordering chicken-and-egg, and the source server isn't running under dev
Even with a real PG18 pg_dump supplied:
doctor --fix tries to pg_dump the embedded DB on :8432, but the dev binary never starts the embedded server, so the source is down → connection to server … port 8432 failed: Connection refused.
- Installing canonical
pgserve does a soft-rename of ~/.pgserve → ~/.autopg (documented, reversible — leaves MIGRATED-FROM-PGSERVE.md), which removes the embedded binaries from ~/.pgserve/bin, so there's no obvious supported way to bring the embedded source up for the dump.
Net: completing the migration required (a) an externally-installed PG18 client, and (b) manually starting the embedded server that dev no longer manages — neither of which omni doctor --fix does. The cascade of manual workarounds is the real blocker.
Impact
- Embedded installs upgrading main→dev get a down API with no working automated recovery.
- Data is intact (embedded PGDATA preserved, 15M PG18) but unreachable by the dev service.
Suggested fixes
omni start/startup: detect deprecated-embedded under a canonical-only binary and fail fast with the omni doctor --fix hint instead of a 30-retry crash loop.
- Make
omni doctor --fix self-sufficient: locate/ship a PG≥server-version pg_dump, and start the embedded server itself (it knows the embedded binary + PGDATA) for the dump before relaunching on canonical.
- Fix the remediation hint: never suggest a
pg_dump older than the server; prefer the bundled/canonical PG18 client or PGDG postgresql-client-18.
- Consider bundling
pg_dump/pg_restore with @embedded-postgres/autopg so migrations don't depend on host tooling.
Notes
Found via sandbox upgrade test. Not observed to affect the production deployment (already canonical autopg-server@2.6.10).
Summary
Upgrading an embedded-Postgres install from
@latest(main,2.260410.1) to@next(dev,2.260617.3) leavesomni-apiin a permanent crash loop, and the documented remediation —omni doctor --fix(embedded → canonical pgserve migration) — cannot complete on a clean host. Data is not lost, but the service is down with no working automated path forward.Found via an isolated sandbox upgrade test (rootful Podman,
ubuntu:24.04, unprivileged user, real installer flow). The production VM (already on canonicalautopg-server@2.6.10) is not an embedded install and is likely unaffected; this bites embedded/older installs upgrading across the backbone change.Environment
ubuntu:24.04, install via the published flow:bun add -g @automagik/omni@<channel>→omni install --non-interactive→omni start.2.260410.1installs an embedded, in-process Postgres (PG18) on :8432.2.260617.3expects a standalone canonicalpgserve/autopgon :5432.Repro
@latest): healthy — embedded PG18 on :8432, 18 migrations, NATS connected. ✅bun add -g @automagik/omni@nextthenomni stop && omni start.omni-apicrash-loops:ERROR api:startup "Failed to start API server" error="Database not ready after 30 attempts".omni startreports services "online" while the API silently restarts.omni doctorcorrectly flags it:pgserve-canonical … embedded … DEPRECATED. Run omni doctor --fix (idempotent; pg_dump → pgserve install → restore → relaunch).omni doctor --fixfails and cannot recover (details below).Root causes (each independently blocks the migration)
1. Opaque failure on the bare upgrade
After upgrade,
omni-apionly logsDatabase not ready after 30 attempts. Nothing tells the operator the DB-backbone model changed (embedded → canonical) or to runomni doctor --fix.omni startshould fail-fast on a deprecated-embedded + dev-binary combination with an actionable hint, not enter a silent restart loop.2. No
pg_dumpships with any bundled Postgres distributionomni doctor --fixneeds a PG18pg_dump, but:@embedded-postgresbundles onlyinitdb/pg_ctl/postgres— nopg_dump/pg_restore/psql.autopg/pgserve(~/.autopg/bin/...) — same: onlyinitdb/pg_ctl/postgres.find / -name pg_dump→ empty on a fresh install.3. The remediation hint installs the wrong major version
doctor --fixsuggestsapt install postgresql-client. On Ubuntu 24.04 that's PG16, which refuses to dump a PG18 server (pg_dump: server version 18 … is newer). The hint is actively misleading; it should point at a PG≥server client (e.g. PGDGpostgresql-client-18) or use a bundled one.4. Ordering chicken-and-egg, and the source server isn't running under dev
Even with a real PG18
pg_dumpsupplied:doctor --fixtries topg_dumpthe embedded DB on :8432, but the dev binary never starts the embedded server, so the source is down →connection to server … port 8432 failed: Connection refused.pgservedoes a soft-rename of~/.pgserve→~/.autopg(documented, reversible — leavesMIGRATED-FROM-PGSERVE.md), which removes the embedded binaries from~/.pgserve/bin, so there's no obvious supported way to bring the embedded source up for the dump.Net: completing the migration required (a) an externally-installed PG18 client, and (b) manually starting the embedded server that dev no longer manages — neither of which
omni doctor --fixdoes. The cascade of manual workarounds is the real blocker.Impact
Suggested fixes
omni start/startup: detect deprecated-embedded under a canonical-only binary and fail fast with theomni doctor --fixhint instead of a 30-retry crash loop.omni doctor --fixself-sufficient: locate/ship a PG≥server-versionpg_dump, and start the embedded server itself (it knows the embedded binary + PGDATA) for the dump before relaunching on canonical.pg_dumpolder than the server; prefer the bundled/canonical PG18 client or PGDGpostgresql-client-18.pg_dump/pg_restorewith@embedded-postgres/autopgso migrations don't depend on host tooling.Notes
Found via sandbox upgrade test. Not observed to affect the production deployment (already canonical
autopg-server@2.6.10).