fix: sync OAuth tokens between credential pool and credentials file#4765
fix: sync OAuth tokens between credential pool and credentials file#4765netdust wants to merge 2 commits intoNousResearch:mainfrom
Conversation
OAuth refresh tokens are single-use. When multiple consumers share the same Anthropic OAuth session (credential pool entries, Claude Code CLI, multiple Hermes profiles), whichever refreshes first invalidates the refresh token for all others. This causes a cascade: 1. Pool entry tries to refresh with a consumed refresh token → 400 2. Pool marks the credential as "exhausted" with a 24-hour cooldown 3. All subsequent heartbeats skip the credential entirely 4. The fallback to resolve_anthropic_token() only works while the access token in ~/.claude/.credentials.json hasn't expired 5. Once it expires, nothing can auto-recover without manual re-login Fix: - Add _sync_anthropic_entry_from_credentials_file() to detect when ~/.claude/.credentials.json has a newer refresh token and sync it into the pool entry, clearing exhaustion status - After a successful pool refresh, write the new tokens back to ~/.claude/.credentials.json so other consumers stay in sync - On refresh failure, check if the credentials file has a different (newer) refresh token and retry once before marking exhausted - In _available_entries(), sync exhausted claude_code entries from the credentials file before applying the 24-hour cooldown, so a manual re-login or external refresh immediately unblocks agents Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code Review: OAuth Token SynchronizationThanks for tackling this tricky OAuth token sync issue! The problem of single-use refresh tokens causing 24-hour lockouts is real and important to fix. What's Done Well
Suggestions & Questions**1. Missing function ** 2. Runtime imports vs module-level 3. Potential for unnecessary syncs 4. Race condition consideration 5. Test coverage 6. Security note Minor Nit
Overall this looks like a solid fix for a real problem. The main concern is the missing function - please confirm that's intentional or needs to be added. |
Code Review: OAuth Token SynchronizationThanks for tackling this tricky OAuth token sync issue! The problem of single-use refresh tokens causing 24-hour lockouts is real and important to fix. What's Done Well
Suggestions and Questions**1. Missing function ** 2. Runtime imports vs module-level 3. Potential for unnecessary syncs 4. Race condition consideration 5. Test coverage 6. Security note Overall this looks like a solid fix for a real problem. The main concern is the missing function - please confirm that is intentional or needs to be added. |
|
Thanks for the thorough review! Addressing each point: 1. 2. Runtime imports — Intentional, to avoid circular imports between 3. Unnecessary syncs — The sync in 4. Race condition — Good call. The current fix handles this gracefully: if two processes refresh simultaneously, one succeeds and writes the new tokens, the other fails and picks up the new tokens from the credentials file on its next attempt. I'll add a code comment documenting this. 5. Tests — Agreed, unit tests would be valuable. Happy to add them in a follow-up or in this PR if you'd prefer. 6. Security — Confirmed: we only log Let me know if you'd like me to add the race condition comment and/or unit tests to this PR. |
Code Review: PR #4765SummaryThe PR implements OAuth token synchronization between the credential pool and Claude Code's credentials file (~/.claude/.credentials.json). This is a well-identified fix for a real problem: OAuth refresh tokens are single-use, so when one consumer refreshes, others' tokens become invalid. Strengths
Potential Issues
Security Considerations
Suggestions
VerdictApprove - This is a well-thought-out fix that addresses the core problem. The code quality is good with proper error handling. Minor improvements suggested above but not blockers. |
|
Thanks for the approval and the detailed review! Addressing the suggestions: 1. Bare 2. Race condition — Acknowledged. Full elimination would require file locking or an atomic swap, which is more complexity than warranted here. The current approach is self-healing: if two processes race, the loser fails, syncs the winner's token from the file on next attempt, and recovers. Worst case is one extra failed refresh attempt, not a 24-hour lockout. 3. Only checking refresh token, not access token — By design. The refresh token is the single-use component — if it changed, someone else refreshed and we need their tokens. If only the access token changed (unlikely without a refresh), the refresh token would still be valid and our own refresh would succeed normally. 4. No locking — Same reasoning as #2. File locking across processes (Hermes profiles, Claude Code CLI) adds complexity and failure modes (stale locks). The retry-and-sync pattern handles contention gracefully without locks. I'll push the debug log addition for #1. The rest I'd leave as-is given the tradeoffs — let me know if you feel differently. |
Address review feedback: replace bare `except: pass` with a debug log when the post-retry write-back to ~/.claude/.credentials.json fails. The write-back is best-effort (token is already resolved), but logging helps troubleshooting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
~/.hermes/auth.json,~/.hermes/profiles/*/auth.json) and Claude Code's credential file (~/.claude/.credentials.json)Problem
When Hermes runs with
provider: anthropicusing Claude Code OAuth credentials, multiple consumers share the same OAuth session:resolve_anthropic_token()fallback pathWhen any one of these refreshes the token, it invalidates the refresh token for all others. The cascade:
resolve_anthropic_token()works only while access token is validclaude /loginFix
Three changes to
credential_pool.py:_sync_anthropic_entry_from_credentials_file()— new method that checks if~/.claude/.credentials.jsonhas a different (newer) refresh token and syncs it into the pool entry, clearing exhaustion status_refresh_entry()— two additions:claude_codeentry, write the new tokens back to~/.claude/.credentials.json_available_entries()— before applying the 24-hour exhaustion cooldown onclaude_codeentries, sync from the credentials file first — so a manual re-login or external refresh immediately unblocks agentsTest plan
🤖 Generated with Claude Code