Release v0.51.561 — context-window indicator stays correct after model switch (#4618) by nesquena-hermes · Pull Request #4628 · nesquena/hermes-webui

nesquena-hermes · 2026-06-21T18:32:39Z

Release v0.51.561 — context-window indicator stays correct after model switch

Ships the reviewed + gated #4618 (allenliang2022) plus the already-merged-to-master root/FHS bootstrap fix (#4623).

#4618 — broaden the streaming stale-compressor guard to any model-window mismatch

On a session whose model was switched in place (e.g. to claude-opus-4.8 with a 1M window), the live context-window indicator showed the correct window on refresh but reverted to a stale value (e.g. claude-opus-4.5's 168k) the moment a message was sent — and tripped auto-compress early. Root cause: the agent-side compressor caches a context_length from the model it was built/last-updated with; after an in-place switch it can hold a different model's window. The original #3256 guard only corrected the narrow case where the cached value exactly equalled the configured global cap, so a leftover other-model value slipped through.

The correction now applies consistently across all three paths that surface the window:

the live metering snapshot (_live_usage_snapshot),
the final session save, and
the terminal done SSE payload.

Each resolves the real per-model window through the same helper GET /api/session hydration uses (_context_length_lookup_inputs_for_model + get_model_context_length), so streaming and reload converge on one value by construction. It honors the #4248 acceptance gate (a low-confidence 256k metadata fallback can never clobber a larger cached window) and resolves the session's own profile config (not the ambient default-profile config — avoids a cross-profile window leak in the detached streaming worker, #3294). Per-stream cache preserved (one lookup per stream; the #3256 per-tick freeze can't recur).

Review trail

Codex (regression gate): SAFE TO SHIP — confirmed all 3 paths resolved, no new regression.
Opus (advisor): SAFE to ship — validated helper reuse, profile-scoped config, the bug(context): context-window reverts to 256k default on session reload for 1M models, trips early auto-compress (residual of #3256/#3263) #4248 256k-clobber protection on every path, and per-stream perf.
Full suite: 9965 passed, 0 failed.
10 new source-structure regression tests in test_issue3256_context_length_default_only_guard.py pin the broadening + acceptance gate + profile config on all three paths (RED-proven against master).

Gate findings applied during review (each is also covered by a new test):

(Codex) live-snapshot used ambient get_config() → wrong-profile window for non-default profiles → now profile-scoped.
(Opus) broadened guard lacked the bug(context): context-window reverts to 256k default on session reload for 1M models, trips early auto-compress (residual of #3256/#3263) #4248 256k acceptance gate → now reuses _should_accept_session_context_length_refresh.
(Codex re-gate) the save + done-SSE sibling paths still used the old narrow guard → broadened to match.

Co-authored-by: allenliang2022 allenliang2022@users.noreply.github.com

Closes #4618

…d to any model-window mismatch + #4618 regression tests + CHANGELOG Broadens #3256's default-only live-usage guard: the streaming SSE snapshot now always resolves the real per-model window via the same helper GET /api/session hydration uses (_context_length_lookup_inputs_for_model + get_model_context_length) and corrects whenever it differs from the compressor's cached value, with a TypeError fallback to the legacy 2-arg form. Fixes 'refresh shows 1M, send reverts to stale 168k + early auto-compress' on model-switched sessions. Per-stream cache preserved (one lookup/stream). Code byte-identical to PR head 3beb18e. Adds 4 source-structure regression tests (RED-proven on master). Co-authored-by: allenliang2022 <allenliang2022@users.noreply.github.com>

…#4248 256k acceptance gate (Opus downward-clobber) Codex SHIP-WITH-FIXES: live-snapshot used ambient get_config() which in the detached streaming worker resolves the process-global/default profile (#3294) -> for a non-default profile pinning a different per-model context_length it would surface the WRONG profile's window. Now resolves via get_config_for_profile_home on the session's own profile home (mirrors the worker's _cfg resolution). Opus SHIP-WITH-FIXES: broadened guard aligned resolution w/ hydration but not its #4248 acceptance gate -> a transient low-confidence 256k metadata probe could clobber a LARGER cached window mid-stream. Now reuses the exact hydration helper _should_accept_session_context_length_refresh on both modern + legacy paths. + regression tests for both. Co-authored-by: allenliang2022 <allenliang2022@users.noreply.github.com>

…essor guards too Codex re-gate found the broadened live-snapshot guard fixed metering but the two SIBLING paths still used the old default-only exact-cap test: - api/streaming.py final session-save: persisted stale other-model window (168k) to s.context_length -> wrong window on reload. - api/streaming.py terminal SSE: emitted stale window -> indicator REVERTS on stream end (messages.js overwrites S.lastUsage) = the exact 'send reverts to 168k' symptom. Both now resolve the real per-model window via the same hydration helper and honor the #4248 acceptance gate (no 256k downward-clobber), with legacy 2-arg fallback. This is the root-cause completion across all 3 paths (live/save/SSE-done). + 2 regression tests. Co-authored-by: allenliang2022 <allenliang2022@users.noreply.github.com>

…HANGELOG to all-3-paths

…t after model switch; #4618)

greptile-apps · 2026-06-21T18:52:34Z

Greptile Summary

This release ships #4618, which broadens the streaming stale-compressor guard from a narrow exact-cap check (only catching the case where the cached value equalled the global config cap) to a universal mismatch check, fixing the "refresh shows 1M, send-a-message reverts to 168k" bug after an in-place model switch.

The correction is applied consistently across all three paths that surface the context window — live-snapshot (_live_usage_snapshot), final session-save, and terminal done SSE payload — by reusing the same _context_length_lookup_inputs_for_model + get_model_context_length helper that GET /api/session hydration already uses, so the two code paths now converge by construction.
The live-snapshot path correctly reads the session's own profile config via get_config_for_profile_home (instead of the ambient get_config()) to avoid a cross-profile window leak in the detached streaming worker, and the bug(context): context-window reverts to 256k default on session reload for 1M models, trips early auto-compress (residual of #3256/#3263) #4248 256k-clobber acceptance gate is honored on all three paths and both the modern and legacy 2-arg get_model_context_length call sites.
The per-stream _real_ctx_cache from fix(context): scope global model.context_length to model.default only #3256 is preserved so the metadata lookup runs at most once per stream; the 10 new regression tests use source-structure string matching (the repo's established pattern) to pin the broadened condition, helper reuse, profile-scoped config, and acceptance gate against silent regression.

Confidence Score: 4/5

Safe to merge; the fix is well-scoped, all three streaming paths are updated consistently, and the suite of 9965 tests passed with 10 new regression pins covering the specific conditions changed here.

The core streaming logic is correctly broadened: helper reuse ensures the live indicator and hydration converge on the same value, profile-scoping prevents a cross-profile window leak, and the 256k acceptance gate is applied uniformly. The one deliberate limitation — that a switch from a larger model to an exactly-256k model won't self-heal during the active stream — is documented in both the PR description and inline comments, and is handled by hydration on the next page load. No unintended regressions were found.

The three changed blocks in api/streaming.py (live-snapshot ~L5766, session-save ~L8005, done-SSE ~L8292) are the highest-stakes areas; verify that _context_length_lookup_inputs_for_model and _should_accept_session_context_length_refresh remain importable from api.routes in all deployment configurations.

Important Files Changed

Filename	Overview
api/streaming.py	Broadens the stale-compressor guard from a narrow exact-cap check to a universal mismatch check across all three streaming paths (live-snapshot, session-save, done-SSE), reusing the hydration helper for consistency; profile-scoping is correctly applied on each path, and the #4248 256k acceptance gate is honored throughout.
tests/test_issue3256_context_length_default_only_guard.py	Adds 10 source-structure regression tests pinning the broadened guard, helper reuse, profile-scoped config resolution, and the 256k acceptance gate on all three streaming paths; uses the same string-matching pattern as existing tests in this file.
CHANGELOG.md	Release entry for v0.51.561, authored by the release agent as expected for this repo's changelog convention.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant UI as Frontend
    participant SS as Streaming Worker
    participant CC as ContextCompressor
    participant H as Hydration Helper
    participant MD as Model Metadata

    UI->>SS: Send message (session has switched model in-place)
    Note over CC: Cached context_length = old model's window (e.g. 168k)

    SS->>SS: Metering tick → _live_usage_snapshot()
    SS->>CC: "Read context_length (_cc_cl_u = 168k)"
    SS->>H: Resolve real window for agent.model
    H-->>SS: LookupInputs (base_url, api_key, provider, config_ctx_len)
    SS->>MD: get_model_context_length(...)
    MD-->>SS: "_real_u = 1,000,000"
    SS->>SS: _accept_u(168k, 1M) → True
    SS-->>UI: "SSE metering payload: context_length=1M ✓"

    SS->>SS: Session save block
    SS->>CC: "Read context_length (_cc_cl = 168k)"
    SS->>H: Resolve real window (same helper, profile-scoped _cfg)
    H-->>SS: "_real_cc = 1,000,000"
    SS->>SS: "_accept_cc(168k, 1M) → True → _skip_cc_cl=True"
    SS->>SS: "Fallback persists s.context_length = 1M ✓"

    SS->>SS: done SSE payload block
    SS->>CC: "Read context_length (_cc_cl_sse = 168k)"
    SS->>H: Resolve real window (same helper, profile-scoped _cfg)
    H-->>SS: "_real_sse = 1,000,000"
    SS->>SS: "_accept_sse(168k, 1M) → True → _cc_cl_sse=0, fallback fires"
    SS-->>UI: "done SSE payload: context_length=1M ✓"

    Note over UI,MD: All three paths now surface the correct 1M window

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant UI as Frontend
    participant SS as Streaming Worker
    participant CC as ContextCompressor
    participant H as Hydration Helper
    participant MD as Model Metadata

    UI->>SS: Send message (session has switched model in-place)
    Note over CC: Cached context_length = old model's window (e.g. 168k)

    SS->>SS: Metering tick → _live_usage_snapshot()
    SS->>CC: "Read context_length (_cc_cl_u = 168k)"
    SS->>H: Resolve real window for agent.model
    H-->>SS: LookupInputs (base_url, api_key, provider, config_ctx_len)
    SS->>MD: get_model_context_length(...)
    MD-->>SS: "_real_u = 1,000,000"
    SS->>SS: _accept_u(168k, 1M) → True
    SS-->>UI: "SSE metering payload: context_length=1M ✓"

    SS->>SS: Session save block
    SS->>CC: "Read context_length (_cc_cl = 168k)"
    SS->>H: Resolve real window (same helper, profile-scoped _cfg)
    H-->>SS: "_real_cc = 1,000,000"
    SS->>SS: "_accept_cc(168k, 1M) → True → _skip_cc_cl=True"
    SS->>SS: "Fallback persists s.context_length = 1M ✓"

    SS->>SS: done SSE payload block
    SS->>CC: "Read context_length (_cc_cl_sse = 168k)"
    SS->>H: Resolve real window (same helper, profile-scoped _cfg)
    H-->>SS: "_real_sse = 1,000,000"
    SS->>SS: "_accept_sse(168k, 1M) → True → _cc_cl_sse=0, fallback fires"
    SS-->>UI: "done SSE payload: context_length=1M ✓"

    Note over UI,MD: All three paths now surface the correct 1M window

_{Reviews (1): Last reviewed commit: "Release v0.51.561 — Release TT (context-..." | Re-trigger Greptile}

greptile-apps · 2026-06-21T18:52:37Z

+                                    if (
+                                        _real_u and _real_u != _cc_cl_u
+                                        and _accept_u(_cc_cl_u, _real_u)
+                                    ):
+                                        _resolved_real = _real_u


Known gap: model-switch to an exactly-256k model won't self-heal during the stream. All three paths call _should_accept_session_context_length_refresh with model_changed=False (default). For a session whose compressor still holds a larger cached window (e.g. 1M) and whose active model has a genuine 256k context, the acceptance gate returns False (not (256k == 256k and 1M > 256k) → False), so the stale 1M value persists in the live indicator, session-save, and done-SSE for the entire stream. The PR notes this is intentional ("err toward the LARGER window; hydration self-heals via model_changed=True"), but it means a user who switches from a 1M model to a real 256k model won't see the indicator correct until a page refresh. Worth documenting as a known limitation in the issue tracker if not already captured.

nesquena-hermes and others added 5 commits June 21, 2026 17:55

docs(#4618): note model_changed-omission rationale (Opus) + broaden C…

ef0ca03

…HANGELOG to all-3-paths

Release v0.51.561 — Release TT (context-window indicator stays correc…

5d5b04a

…t after model switch; #4618)

nesquena-hermes merged commit ee0b476 into master Jun 21, 2026
11 checks passed

nesquena-hermes deleted the stage/4618-ctx-guard branch June 21, 2026 18:36

greptile-apps Bot reviewed Jun 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release v0.51.561 — context-window indicator stays correct after model switch (#4618)#4628

Release v0.51.561 — context-window indicator stays correct after model switch (#4618)#4628
nesquena-hermes merged 5 commits into
masterfrom
stage/4618-ctx-guard

nesquena-hermes commented Jun 21, 2026

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 21, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

nesquena-hermes commented Jun 21, 2026