Release v0.51.561 — context-window indicator stays correct after model switch (#4618)#4628
Conversation
…d to any model-window mismatch + #4618 regression tests + CHANGELOG Broadens #3256's default-only live-usage guard: the streaming SSE snapshot now always resolves the real per-model window via the same helper GET /api/session hydration uses (_context_length_lookup_inputs_for_model + get_model_context_length) and corrects whenever it differs from the compressor's cached value, with a TypeError fallback to the legacy 2-arg form. Fixes 'refresh shows 1M, send reverts to stale 168k + early auto-compress' on model-switched sessions. Per-stream cache preserved (one lookup/stream). Code byte-identical to PR head 3beb18e. Adds 4 source-structure regression tests (RED-proven on master). Co-authored-by: allenliang2022 <allenliang2022@users.noreply.github.com>
…#4248 256k acceptance gate (Opus downward-clobber) Codex SHIP-WITH-FIXES: live-snapshot used ambient get_config() which in the detached streaming worker resolves the process-global/default profile (#3294) -> for a non-default profile pinning a different per-model context_length it would surface the WRONG profile's window. Now resolves via get_config_for_profile_home on the session's own profile home (mirrors the worker's _cfg resolution). Opus SHIP-WITH-FIXES: broadened guard aligned resolution w/ hydration but not its #4248 acceptance gate -> a transient low-confidence 256k metadata probe could clobber a LARGER cached window mid-stream. Now reuses the exact hydration helper _should_accept_session_context_length_refresh on both modern + legacy paths. + regression tests for both. Co-authored-by: allenliang2022 <allenliang2022@users.noreply.github.com>
…essor guards too
Codex re-gate found the broadened live-snapshot guard fixed metering but the two
SIBLING paths still used the old default-only exact-cap test:
- api/streaming.py final session-save: persisted stale other-model window (168k)
to s.context_length -> wrong window on reload.
- api/streaming.py terminal SSE: emitted stale window -> indicator REVERTS
on stream end (messages.js overwrites S.lastUsage) = the exact 'send reverts to
168k' symptom.
Both now resolve the real per-model window via the same hydration helper and honor
the #4248 acceptance gate (no 256k downward-clobber), with legacy 2-arg fallback.
This is the root-cause completion across all 3 paths (live/save/SSE-done).
+ 2 regression tests. Co-authored-by: allenliang2022 <allenliang2022@users.noreply.github.com>
…HANGELOG to all-3-paths
…t after model switch; #4618)
| if ( | ||
| _real_u and _real_u != _cc_cl_u | ||
| and _accept_u(_cc_cl_u, _real_u) | ||
| ): | ||
| _resolved_real = _real_u |
There was a problem hiding this comment.
Known gap: model-switch to an exactly-256k model won't self-heal during the stream. All three paths call
_should_accept_session_context_length_refresh with model_changed=False (default). For a session whose compressor still holds a larger cached window (e.g. 1M) and whose active model has a genuine 256k context, the acceptance gate returns False (not (256k == 256k and 1M > 256k) → False), so the stale 1M value persists in the live indicator, session-save, and done-SSE for the entire stream. The PR notes this is intentional ("err toward the LARGER window; hydration self-heals via model_changed=True"), but it means a user who switches from a 1M model to a real 256k model won't see the indicator correct until a page refresh. Worth documenting as a known limitation in the issue tracker if not already captured.
Release v0.51.561 — context-window indicator stays correct after model switch
Ships the reviewed + gated #4618 (allenliang2022) plus the already-merged-to-master root/FHS bootstrap fix (#4623).
#4618 — broaden the streaming stale-compressor guard to any model-window mismatch
On a session whose model was switched in place (e.g. to
claude-opus-4.8with a 1M window), the live context-window indicator showed the correct window on refresh but reverted to a stale value (e.g.claude-opus-4.5's 168k) the moment a message was sent — and tripped auto-compress early. Root cause: the agent-side compressor caches acontext_lengthfrom the model it was built/last-updated with; after an in-place switch it can hold a different model's window. The original #3256 guard only corrected the narrow case where the cached value exactly equalled the configured global cap, so a leftover other-model value slipped through.The correction now applies consistently across all three paths that surface the window:
_live_usage_snapshot),doneSSE payload.Each resolves the real per-model window through the same helper
GET /api/sessionhydration uses (_context_length_lookup_inputs_for_model+get_model_context_length), so streaming and reload converge on one value by construction. It honors the #4248 acceptance gate (a low-confidence 256k metadata fallback can never clobber a larger cached window) and resolves the session's own profile config (not the ambient default-profile config — avoids a cross-profile window leak in the detached streaming worker, #3294). Per-stream cache preserved (one lookup per stream; the #3256 per-tick freeze can't recur).Review trail
test_issue3256_context_length_default_only_guard.pypin the broadening + acceptance gate + profile config on all three paths (RED-proven against master).Gate findings applied during review (each is also covered by a new test):
get_config()→ wrong-profile window for non-default profiles → now profile-scoped._should_accept_session_context_length_refresh.done-SSE sibling paths still used the old narrow guard → broadened to match.Co-authored-by: allenliang2022 allenliang2022@users.noreply.github.com
Closes #4618