Skip to content

Resolve /api/providers + /api/models credentials from the thread-local profile channel (drop the os.environ mirror race) #3961

@nesquena-hermes

Description

@nesquena-hermes

Summary

Follow-up from the #3957 fix (#3960, v0.51.356). The independent review of #3960 surfaced a pre-existing architectural limitation in how the read-only provider/model endpoints resolve per-profile credentials, recommending a tracking issue.

The limitation

WebUI profile switching is per-client/cookie-scoped (#798). The fix in #3960 routes /api/providers and /api/models (and the detached models-rebuild worker) through the active profile via profile_env_for_active_request / profile_scope_for_detached_worker, which delegate to the pre-existing profile_env_for_background_worker (the same mechanism streaming uses for title-gen / checkpoints).

That mechanism uses two channels:

  1. a thread-local (_thread_ctx) — race-free, the primary routing channel; and
  2. a process-global os.environ mirror, kept because several production Hermes readers still call os.getenv() directly for provider credentials.

By explicit design (api/profiles.py ~L739-743), _ENV_LOCK is held only around the env setup/restore, NOT the yield body (unlike _cron_env_lock, which serializes the whole body). The server is a ThreadingHTTPServer (daemon_threads=True), so requests run concurrently.

Consequence: two clients on different non-default profiles hitting /api/providers concurrently can interleave their os.environ mutations during the (multi-second) serial auth-probe window — transiently bleeding one profile's credentials into the other's probe, and potentially leaving a non-default profile's value as the process-global "default" after both restore.

Scope / severity

  • Pre-existing, not introduced by fix(profiles): scope /api/providers + /api/models to the active profile (#3957) #3960. fix(profiles): scope /api/providers + /api/models to the active profile (#3957) #3960's diff to api/profiles.py is purely additive — profile_env_for_background_worker itself is unchanged. Streaming already exercises the identical os.environ-mirror race for concurrent different-profile title-gen/checkpoint workers.
  • /api/models is largely insulated: its rebuild is serialized by _cache_build_cv (one rebuild at a time), and its cache path + fingerprint resolve through the thread-local (race-free). The exposure concentrates on /api/providers and the opt-in synchronous rebuild (_LIVE_REBUILD_BUDGET_SECONDS <= 0).
  • Same-profile concurrent clients are fine (identical values → no bleed). The bug needs ≥2 distinct non-default profiles hitting /api/providers simultaneously, and is transient / self-correcting on the next non-concurrent load.

Recommended fix

Resolve provider credentials for /api/providers + /api/models from the thread-local profile channel rather than the os.environ mirror — eliminating the cross-profile race for these read endpoints without process-global mutation. This spans the streaming usage too (the shared profile_env_for_background_worker mechanism), so it's a deliberate, broader change rather than a quick patch:

  • Audit which downstream readers (hermes_cli.auth.get_auth_status, hermes_cli.models.provider_model_ids, _lookup_custom_api_key_env) read os.getenv directly vs. accept an explicit credential/home argument.
  • Provide a thread-local-first resolution path so the read endpoints never need the os.environ mirror.
  • Keep the default-profile path a no-op / unchanged.

Refs: #3957, #3960, #798.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions