Skip to content

🔻 going-down message + downtime in success post#19

Open
amiller wants to merge 1 commit into
mainfrom
going-down-message
Open

🔻 going-down message + downtime in success post#19
amiller wants to merge 1 commit into
mainfrom
going-down-message

Conversation

@amiller
Copy link
Copy Markdown
Collaborator

@amiller amiller commented Apr 26, 2026

Two visibility adds for the deploy flow, both in one PR since they're tightly related:

1. New step: "going-down notice"

Runs immediately before phala deploy (i.e. after the 10-min countdown). Posts to ADMIN_COMMAND_ROOM:

🔻 restarting CVM now — mtrx.shaperotator.xyz briefly unreachable, back in ~2 min

Why: the heads-up post says "in 10 min" but there's no signal at the moment the actual restart fires. Anyone watching #matrix-devops who reads the heads-up but then forgets — they'd see SSL handshake failures on mtrx without knowing why. The 🔻 message removes the gap. Also sets DOWNTIME_START_TS in $GITHUB_ENV.

2. Downtime field in success post

Health check step now also writes DOWNTIME_END_TS to $GITHUB_ENV the moment /versions returns 200 again. Post-deploy note reads both timestamps:

🚀 deployed abc1234 to dstack-matrix
downtime: ~95s

<commit message>

run: ...

So now you can eyeball deploy-to-deploy whether Phala is being slow ("90s downtime, normal") vs hitting trouble ("420s downtime, getting close to the recovery branch").

Three messages per deploy now

When Message
10 min before 🚧 deploying X in ~10 min — cancel: ‹link›
at restart 🔻 restarting CVM now — back in ~2 min
after recovery 🚀 deployed X — downtime: ~Xs

Each carries distinct info: heads-up + cancel window / outage starts now / outage ended + how long.

Will land in v0.5 whenever you tag it.

Two adds in one PR — they're both about closing the visibility gap
between the heads-up and the success message:

1. New step "going-down notice" runs immediately before phala deploy.
   Posts a "🔻 restarting CVM now — back in ~2 min" message to
   ADMIN_COMMAND_ROOM and records DOWNTIME_START_TS in $GITHUB_ENV.
   The cancel window has closed by this point.

2. Health check captures DOWNTIME_END_TS the moment /versions starts
   returning 200 again. Post-deploy note now reads both timestamps and
   includes "downtime: ~Xs" in the body, e.g.

     🚀 deployed abc1234 to dstack-matrix
     downtime: ~95s

     <commit message>

     run: ...

Each deploy now produces three messages in #matrix-devops:
- 🚧 deploying X in ~10 min — cancel: <link>
- 🔻 restarting CVM now — back in ~2 min       ← new
- 🚀 deployed X — downtime: ~Xs                ← downtime field new

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant