Skip to content

feat(upgrade): failsafe kars upgrade for existing clusters#455

Merged
pallakatos merged 1 commit into
mainfrom
feat/kars-upgrade-command
Jun 25, 2026
Merged

feat(upgrade): failsafe kars upgrade for existing clusters#455
pallakatos merged 1 commit into
mainfrom
feat/kars-upgrade-command

Conversation

@pallakatos

Copy link
Copy Markdown
Collaborator

Summary

Adds kars upgrade — a foolproof/failsafe path to move an existing kars cluster to a published GitHub release. Requested for the overnight session. Draft for design review — validated against the live kars-aks cluster via --dry-run, but the mutating path hasn't been run end-to-end on a real upgrade yet.

Why

Today's kars up --upgrade is a Helm-only re-run that assumes the ACR already holds the new images — so it can't actually move a customer's cluster to a freshly-cut GitHub release. kars upgrade closes that gap.

Flow (failsafe)

  1. Connect to the cached cluster; verify the kars Helm release exists.
  2. Resolve target (latest GitHub release, or --to <tag>) and current version (stamped Helm value → fallback chart appVersion).
  3. Skip when already at target (--force overrides); refuse silent downgrade.
  4. --dry-run prints the plan with zero changes.
  5. Import target release images into the user's ACR — :latest (what the chart uses) and the immutable :<tag> for pin/rollback; required images fail closed.
  6. helm upgrade --atomic — failed upgrade auto-rolls-back; cluster never half-migrated. CRDs are templated, so they update.
  7. Rolling-restart the controller + every sandbox Deployment (router is a sidecar, rolled with its pod).
  8. Verify controller availability; report old → new.
  9. --rollback reverts to the previous Helm revision + restart + verify.
kars upgrade                  # to latest GitHub release
kars upgrade --to v0.1.16     # pin
kars upgrade --dry-run        # preview, no changes
kars upgrade --rollback       # revert

New shared lib

cli/src/lib/release.ts — release image plan + SemVer compare (incl. prerelease ordering) + latest-release discovery, all unit-tested.

Security audit

docs/internal/security-audits/2026-06-25-kars-upgrade-command.md (2 sign-offs). Same operator-scoped operations as kars up (az acr import of public signed images + helm upgrade + rollout restart); adds --atomic/--rollback safety. No new attack surface.

Verification

  • tsc + oxlint clean, 830 tests (+9 new).
  • fetchLatestReleaseTag()v0.1.16 (live GitHub API).
  • kars upgrade --dry-run ran end-to-end on the live kars-aks cluster — planned the 12-image import, made no changes.

Known limitation (follow-up)

Current-version detection is fully reliable only after the first kars upgrade (which stamps karsRelease into Helm values); before that it falls back to the chart's static appVersion. The upgrade is idempotent regardless. A cleaner long-term signal (image digest annotation) is a follow-up.

Not yet done (wants your eyes before merge)

  • A real mutating upgrade run on a disposable cluster.
  • Decide whether kars up --upgrade should delegate to this (dedupe).

@github-actions

Copy link
Copy Markdown

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Move an existing kars cluster to a published GitHub release safely. Unlike
`kars up --upgrade` (Helm-only re-run that assumes the ACR already has the new
images), `kars upgrade`:

- detects current vs target version (latest GitHub release, or --to <tag>);
  stamps the deployed release into Helm values so the NEXT upgrade reports the
  true current version (the chart appVersion is static);
- --dry-run prints the full plan and makes no changes;
- imports the target release images into the user's ACR — :latest (what the
  chart references) AND the immutable :<tag> for pin/rollback; required images
  fail closed;
- `helm upgrade --atomic` so a failed upgrade auto-rolls-back the release and
  the cluster never lands half-migrated (CRDs are templated, so they update);
- rolling-restarts the controller + every sandbox Deployment (the router is a
  sidecar, rolled with its pod) onto the new :latest;
- verifies controller availability and reports old -> new;
- --rollback reverts to the previous Helm revision + restart + verify;
- already-at-target and newer-than-target guards prevent needless/destructive
  re-deploys; --force overrides.

New cli/src/lib/release.ts factors the release image plan + SemVer compare +
latest-release discovery (shared, tested).

Security audit: docs/internal/security-audits/2026-06-25-kars-upgrade-command.md (2 sign-offs).
Verification: tsc + oxlint clean, 830 tests (+9). Validated live: latest
detection returned v0.1.16, and `kars upgrade --dry-run` ran end-to-end against
the kars-aks cluster (planned the 12-image import with no changes).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@pallakatos pallakatos force-pushed the feat/kars-upgrade-command branch from f435925 to 826060b Compare June 25, 2026 20:38
@pallakatos pallakatos marked this pull request as ready for review June 25, 2026 20:38
@pallakatos pallakatos merged commit 87a33b7 into main Jun 25, 2026
35 checks passed
@pallakatos pallakatos deleted the feat/kars-upgrade-command branch June 25, 2026 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant