Skip to content

approver: self-mint bot tokens on boot from persisted passwords#35

Merged
amiller merged 1 commit into
mainfrom
self-heal-bot-tokens
May 9, 2026
Merged

approver: self-mint bot tokens on boot from persisted passwords#35
amiller merged 1 commit into
mainfrom
self-heal-bot-tokens

Conversation

@amiller
Copy link
Copy Markdown
Collaborator

@amiller amiller commented May 9, 2026

Closes #34.

Summary

Twice now the prod approver has gone down because KNOCK_APPROVER_TOKEN in the sealed env diverged from the working token in prod (April 2026 + May 2026). Each recovery requires users reset-password --logout via conduwuit --execute, which involves ~30s of homeserver downtime affecting all users + federation. The token is the wrong source of truth.

This makes both bots self-healing on boot:

  1. Try MATRIX_TOKEN / ONBOARDING_BOT_TOKEN via /whoami
  2. On M_UNKNOWN_TOKEN, read the password from /data/<bot>_password and /login for a fresh token
  3. If the device_id changed, wipe bot_crypto.db (mautrix refuses to load with a device-id mismatch)

@onboarding-bot already had its password file from #31; this adds the same pattern for @shape-rotator-2. Once the password is on the volume, future env clobbers, password rotations, and admin reset-password operations are recoverable without homeserver downtime.

Test plan

  • bash tests/run_e2e.sh locally — smoke 18/18, vetting 22/22, lobby 27/27 (one E2EE flake on first attempt, clean on retry; same admin_e2ee !mint flake as on main, unrelated)
  • After merge, write password to /data/shape_rotator_2_password on prod via volume mount + reset @shape-rotator-2 password to match (one final ~30s downtime). After that this exact failure becomes unreachable.

🤖 Generated with Claude Code

Twice now the prod approver has gone down because KNOCK_APPROVER_TOKEN
in the sealed env diverged from the working token in prod, requiring a
disruptive admin reset-password (~30s of homeserver downtime affecting
all users + federation). The token is the wrong source of truth — the
password is.

On startup, both bots now:
  1. Try the env-provided access token via /whoami
  2. On M_UNKNOWN_TOKEN (or any 401), read the bot's password from
     /data/<bot>_password and call /login for a fresh token + device_id
  3. If the device_id changed (fresh login), wipe bot_crypto.db so
     mautrix doesn't fail with "stale crypto state" on next boot

@onboarding-bot already had its password persisted at
/data/onboarding_bot_password (PR #31). This adds the same pattern for
@shape-rotator-2 at /data/shape_rotator_2_password. Once a password
file is in place on the volume, the bot is self-healing across env
clobbers, password rotations, and admin reset-password operations —
no homeserver downtime needed.

MATRIX_TOKEN is now optional (was required) since the password file is
sufficient to bootstrap.

Closes #34.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@amiller amiller merged commit 02e2116 into main May 9, 2026
1 check passed
@amiller amiller deleted the self-heal-bot-tokens branch May 9, 2026 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

approver: self-mint bot tokens on boot from persisted passwords

1 participant