feat(upgrade): failsafe kars upgrade for existing clusters#455
Merged
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
Move an existing kars cluster to a published GitHub release safely. Unlike `kars up --upgrade` (Helm-only re-run that assumes the ACR already has the new images), `kars upgrade`: - detects current vs target version (latest GitHub release, or --to <tag>); stamps the deployed release into Helm values so the NEXT upgrade reports the true current version (the chart appVersion is static); - --dry-run prints the full plan and makes no changes; - imports the target release images into the user's ACR — :latest (what the chart references) AND the immutable :<tag> for pin/rollback; required images fail closed; - `helm upgrade --atomic` so a failed upgrade auto-rolls-back the release and the cluster never lands half-migrated (CRDs are templated, so they update); - rolling-restarts the controller + every sandbox Deployment (the router is a sidecar, rolled with its pod) onto the new :latest; - verifies controller availability and reports old -> new; - --rollback reverts to the previous Helm revision + restart + verify; - already-at-target and newer-than-target guards prevent needless/destructive re-deploys; --force overrides. New cli/src/lib/release.ts factors the release image plan + SemVer compare + latest-release discovery (shared, tested). Security audit: docs/internal/security-audits/2026-06-25-kars-upgrade-command.md (2 sign-offs). Verification: tsc + oxlint clean, 830 tests (+9). Validated live: latest detection returned v0.1.16, and `kars upgrade --dry-run` ran end-to-end against the kars-aks cluster (planned the 12-image import with no changes). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
f435925 to
826060b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
kars upgrade— a foolproof/failsafe path to move an existing kars cluster to a published GitHub release. Requested for the overnight session. Draft for design review — validated against the livekars-akscluster via--dry-run, but the mutating path hasn't been run end-to-end on a real upgrade yet.Why
Today's
kars up --upgradeis a Helm-only re-run that assumes the ACR already holds the new images — so it can't actually move a customer's cluster to a freshly-cut GitHub release.kars upgradecloses that gap.Flow (failsafe)
karsHelm release exists.--to <tag>) and current version (stamped Helm value → fallback chart appVersion).--forceoverrides); refuse silent downgrade.--dry-runprints the plan with zero changes.:latest(what the chart uses) and the immutable:<tag>for pin/rollback; required images fail closed.helm upgrade --atomic— failed upgrade auto-rolls-back; cluster never half-migrated. CRDs are templated, so they update.--rollbackreverts to the previous Helm revision + restart + verify.New shared lib
cli/src/lib/release.ts— release image plan + SemVer compare (incl. prerelease ordering) + latest-release discovery, all unit-tested.Security audit
docs/internal/security-audits/2026-06-25-kars-upgrade-command.md(2 sign-offs). Same operator-scoped operations askars up(az acr import of public signed images + helm upgrade + rollout restart); adds--atomic/--rollbacksafety. No new attack surface.Verification
tsc+ oxlint clean, 830 tests (+9 new).fetchLatestReleaseTag()→v0.1.16(live GitHub API).kars upgrade --dry-runran end-to-end on the livekars-akscluster — planned the 12-image import, made no changes.Known limitation (follow-up)
Current-version detection is fully reliable only after the first
kars upgrade(which stampskarsReleaseinto Helm values); before that it falls back to the chart's static appVersion. The upgrade is idempotent regardless. A cleaner long-term signal (image digest annotation) is a follow-up.Not yet done (wants your eyes before merge)
kars up --upgradeshould delegate to this (dedupe).