approver: self-mint bot tokens on boot from persisted passwords#35
Merged
Conversation
Twice now the prod approver has gone down because KNOCK_APPROVER_TOKEN
in the sealed env diverged from the working token in prod, requiring a
disruptive admin reset-password (~30s of homeserver downtime affecting
all users + federation). The token is the wrong source of truth — the
password is.
On startup, both bots now:
1. Try the env-provided access token via /whoami
2. On M_UNKNOWN_TOKEN (or any 401), read the bot's password from
/data/<bot>_password and call /login for a fresh token + device_id
3. If the device_id changed (fresh login), wipe bot_crypto.db so
mautrix doesn't fail with "stale crypto state" on next boot
@onboarding-bot already had its password persisted at
/data/onboarding_bot_password (PR #31). This adds the same pattern for
@shape-rotator-2 at /data/shape_rotator_2_password. Once a password
file is in place on the volume, the bot is self-healing across env
clobbers, password rotations, and admin reset-password operations —
no homeserver downtime needed.
MATRIX_TOKEN is now optional (was required) since the password file is
sufficient to bootstrap.
Closes #34.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #34.
Summary
Twice now the prod approver has gone down because
KNOCK_APPROVER_TOKENin the sealed env diverged from the working token in prod (April 2026 + May 2026). Each recovery requiresusers reset-password --logoutviaconduwuit --execute, which involves ~30s of homeserver downtime affecting all users + federation. The token is the wrong source of truth.This makes both bots self-healing on boot:
MATRIX_TOKEN/ONBOARDING_BOT_TOKENvia/whoamiM_UNKNOWN_TOKEN, read the password from/data/<bot>_passwordand/loginfor a fresh tokenbot_crypto.db(mautrix refuses to load with a device-id mismatch)@onboarding-botalready had its password file from #31; this adds the same pattern for@shape-rotator-2. Once the password is on the volume, future env clobbers, password rotations, and admin reset-password operations are recoverable without homeserver downtime.Test plan
bash tests/run_e2e.shlocally — smoke 18/18, vetting 22/22, lobby 27/27 (one E2EE flake on first attempt, clean on retry; same admin_e2ee!mintflake as on main, unrelated)/data/shape_rotator_2_passwordon prod via volume mount + reset @shape-rotator-2 password to match (one final ~30s downtime). After that this exact failure becomes unreachable.🤖 Generated with Claude Code