Skip to content

Release v0.51.561 — context-window indicator stays correct after model switch (#4618)#4628

Merged
nesquena-hermes merged 5 commits into
masterfrom
stage/4618-ctx-guard
Jun 21, 2026
Merged

Release v0.51.561 — context-window indicator stays correct after model switch (#4618)#4628
nesquena-hermes merged 5 commits into
masterfrom
stage/4618-ctx-guard

Conversation

@nesquena-hermes

Copy link
Copy Markdown
Collaborator

Release v0.51.561 — context-window indicator stays correct after model switch

Ships the reviewed + gated #4618 (allenliang2022) plus the already-merged-to-master root/FHS bootstrap fix (#4623).

#4618 — broaden the streaming stale-compressor guard to any model-window mismatch

On a session whose model was switched in place (e.g. to claude-opus-4.8 with a 1M window), the live context-window indicator showed the correct window on refresh but reverted to a stale value (e.g. claude-opus-4.5's 168k) the moment a message was sent — and tripped auto-compress early. Root cause: the agent-side compressor caches a context_length from the model it was built/last-updated with; after an in-place switch it can hold a different model's window. The original #3256 guard only corrected the narrow case where the cached value exactly equalled the configured global cap, so a leftover other-model value slipped through.

The correction now applies consistently across all three paths that surface the window:

  • the live metering snapshot (_live_usage_snapshot),
  • the final session save, and
  • the terminal done SSE payload.

Each resolves the real per-model window through the same helper GET /api/session hydration uses (_context_length_lookup_inputs_for_model + get_model_context_length), so streaming and reload converge on one value by construction. It honors the #4248 acceptance gate (a low-confidence 256k metadata fallback can never clobber a larger cached window) and resolves the session's own profile config (not the ambient default-profile config — avoids a cross-profile window leak in the detached streaming worker, #3294). Per-stream cache preserved (one lookup per stream; the #3256 per-tick freeze can't recur).

Review trail

Gate findings applied during review (each is also covered by a new test):

  1. (Codex) live-snapshot used ambient get_config() → wrong-profile window for non-default profiles → now profile-scoped.
  2. (Opus) broadened guard lacked the bug(context): context-window reverts to 256k default on session reload for 1M models, trips early auto-compress (residual of #3256/#3263) #4248 256k acceptance gate → now reuses _should_accept_session_context_length_refresh.
  3. (Codex re-gate) the save + done-SSE sibling paths still used the old narrow guard → broadened to match.

Co-authored-by: allenliang2022 allenliang2022@users.noreply.github.com

Closes #4618

nesquena-hermes and others added 5 commits June 21, 2026 17:55
…d to any model-window mismatch + #4618 regression tests + CHANGELOG

Broadens #3256's default-only live-usage guard: the streaming SSE snapshot now
always resolves the real per-model window via the same helper GET /api/session
hydration uses (_context_length_lookup_inputs_for_model + get_model_context_length)
and corrects whenever it differs from the compressor's cached value, with a
TypeError fallback to the legacy 2-arg form. Fixes 'refresh shows 1M, send reverts
to stale 168k + early auto-compress' on model-switched sessions. Per-stream cache
preserved (one lookup/stream). Code byte-identical to PR head 3beb18e.

Adds 4 source-structure regression tests (RED-proven on master).

Co-authored-by: allenliang2022 <allenliang2022@users.noreply.github.com>
…#4248 256k acceptance gate (Opus downward-clobber)

Codex SHIP-WITH-FIXES: live-snapshot used ambient get_config() which in the
detached streaming worker resolves the process-global/default profile (#3294) ->
for a non-default profile pinning a different per-model context_length it would
surface the WRONG profile's window. Now resolves via get_config_for_profile_home
on the session's own profile home (mirrors the worker's _cfg resolution).

Opus SHIP-WITH-FIXES: broadened guard aligned resolution w/ hydration but not its
#4248 acceptance gate -> a transient low-confidence 256k metadata probe could
clobber a LARGER cached window mid-stream. Now reuses the exact hydration helper
_should_accept_session_context_length_refresh on both modern + legacy paths.

+ regression tests for both. Co-authored-by: allenliang2022 <allenliang2022@users.noreply.github.com>
…essor guards too

Codex re-gate found the broadened live-snapshot guard fixed metering but the two
SIBLING paths still used the old default-only exact-cap test:
  - api/streaming.py final session-save: persisted stale other-model window (168k)
    to s.context_length -> wrong window on reload.
  - api/streaming.py terminal  SSE: emitted stale window -> indicator REVERTS
    on stream end (messages.js overwrites S.lastUsage) = the exact 'send reverts to
    168k' symptom.
Both now resolve the real per-model window via the same hydration helper and honor
the #4248 acceptance gate (no 256k downward-clobber), with legacy 2-arg fallback.
This is the root-cause completion across all 3 paths (live/save/SSE-done).

+ 2 regression tests. Co-authored-by: allenliang2022 <allenliang2022@users.noreply.github.com>
@nesquena-hermes nesquena-hermes merged commit ee0b476 into master Jun 21, 2026
11 checks passed
@nesquena-hermes nesquena-hermes deleted the stage/4618-ctx-guard branch June 21, 2026 18:36
@greptile-apps

greptile-apps Bot commented Jun 21, 2026

Copy link
Copy Markdown

Greptile Summary

This release ships #4618, which broadens the streaming stale-compressor guard from a narrow exact-cap check (only catching the case where the cached value equalled the global config cap) to a universal mismatch check, fixing the "refresh shows 1M, send-a-message reverts to 168k" bug after an in-place model switch.

  • The correction is applied consistently across all three paths that surface the context window — live-snapshot (_live_usage_snapshot), final session-save, and terminal done SSE payload — by reusing the same _context_length_lookup_inputs_for_model + get_model_context_length helper that GET /api/session hydration already uses, so the two code paths now converge by construction.
  • The live-snapshot path correctly reads the session's own profile config via get_config_for_profile_home (instead of the ambient get_config()) to avoid a cross-profile window leak in the detached streaming worker, and the bug(context): context-window reverts to 256k default on session reload for 1M models, trips early auto-compress (residual of #3256/#3263) #4248 256k-clobber acceptance gate is honored on all three paths and both the modern and legacy 2-arg get_model_context_length call sites.
  • The per-stream _real_ctx_cache from fix(context): scope global model.context_length to model.default only #3256 is preserved so the metadata lookup runs at most once per stream; the 10 new regression tests use source-structure string matching (the repo's established pattern) to pin the broadened condition, helper reuse, profile-scoped config, and acceptance gate against silent regression.

Confidence Score: 4/5

Safe to merge; the fix is well-scoped, all three streaming paths are updated consistently, and the suite of 9965 tests passed with 10 new regression pins covering the specific conditions changed here.

The core streaming logic is correctly broadened: helper reuse ensures the live indicator and hydration converge on the same value, profile-scoping prevents a cross-profile window leak, and the 256k acceptance gate is applied uniformly. The one deliberate limitation — that a switch from a larger model to an exactly-256k model won't self-heal during the active stream — is documented in both the PR description and inline comments, and is handled by hydration on the next page load. No unintended regressions were found.

The three changed blocks in api/streaming.py (live-snapshot ~L5766, session-save ~L8005, done-SSE ~L8292) are the highest-stakes areas; verify that _context_length_lookup_inputs_for_model and _should_accept_session_context_length_refresh remain importable from api.routes in all deployment configurations.

Important Files Changed

Filename Overview
api/streaming.py Broadens the stale-compressor guard from a narrow exact-cap check to a universal mismatch check across all three streaming paths (live-snapshot, session-save, done-SSE), reusing the hydration helper for consistency; profile-scoping is correctly applied on each path, and the #4248 256k acceptance gate is honored throughout.
tests/test_issue3256_context_length_default_only_guard.py Adds 10 source-structure regression tests pinning the broadened guard, helper reuse, profile-scoped config resolution, and the 256k acceptance gate on all three streaming paths; uses the same string-matching pattern as existing tests in this file.
CHANGELOG.md Release entry for v0.51.561, authored by the release agent as expected for this repo's changelog convention.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant UI as Frontend
    participant SS as Streaming Worker
    participant CC as ContextCompressor
    participant H as Hydration Helper
    participant MD as Model Metadata

    UI->>SS: Send message (session has switched model in-place)
    Note over CC: Cached context_length = old model's window (e.g. 168k)

    SS->>SS: Metering tick → _live_usage_snapshot()
    SS->>CC: "Read context_length (_cc_cl_u = 168k)"
    SS->>H: Resolve real window for agent.model
    H-->>SS: LookupInputs (base_url, api_key, provider, config_ctx_len)
    SS->>MD: get_model_context_length(...)
    MD-->>SS: "_real_u = 1,000,000"
    SS->>SS: _accept_u(168k, 1M) → True
    SS-->>UI: "SSE metering payload: context_length=1M ✓"

    SS->>SS: Session save block
    SS->>CC: "Read context_length (_cc_cl = 168k)"
    SS->>H: Resolve real window (same helper, profile-scoped _cfg)
    H-->>SS: "_real_cc = 1,000,000"
    SS->>SS: "_accept_cc(168k, 1M) → True → _skip_cc_cl=True"
    SS->>SS: "Fallback persists s.context_length = 1M ✓"

    SS->>SS: done SSE payload block
    SS->>CC: "Read context_length (_cc_cl_sse = 168k)"
    SS->>H: Resolve real window (same helper, profile-scoped _cfg)
    H-->>SS: "_real_sse = 1,000,000"
    SS->>SS: "_accept_sse(168k, 1M) → True → _cc_cl_sse=0, fallback fires"
    SS-->>UI: "done SSE payload: context_length=1M ✓"

    Note over UI,MD: All three paths now surface the correct 1M window
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant UI as Frontend
    participant SS as Streaming Worker
    participant CC as ContextCompressor
    participant H as Hydration Helper
    participant MD as Model Metadata

    UI->>SS: Send message (session has switched model in-place)
    Note over CC: Cached context_length = old model's window (e.g. 168k)

    SS->>SS: Metering tick → _live_usage_snapshot()
    SS->>CC: "Read context_length (_cc_cl_u = 168k)"
    SS->>H: Resolve real window for agent.model
    H-->>SS: LookupInputs (base_url, api_key, provider, config_ctx_len)
    SS->>MD: get_model_context_length(...)
    MD-->>SS: "_real_u = 1,000,000"
    SS->>SS: _accept_u(168k, 1M) → True
    SS-->>UI: "SSE metering payload: context_length=1M ✓"

    SS->>SS: Session save block
    SS->>CC: "Read context_length (_cc_cl = 168k)"
    SS->>H: Resolve real window (same helper, profile-scoped _cfg)
    H-->>SS: "_real_cc = 1,000,000"
    SS->>SS: "_accept_cc(168k, 1M) → True → _skip_cc_cl=True"
    SS->>SS: "Fallback persists s.context_length = 1M ✓"

    SS->>SS: done SSE payload block
    SS->>CC: "Read context_length (_cc_cl_sse = 168k)"
    SS->>H: Resolve real window (same helper, profile-scoped _cfg)
    H-->>SS: "_real_sse = 1,000,000"
    SS->>SS: "_accept_sse(168k, 1M) → True → _cc_cl_sse=0, fallback fires"
    SS-->>UI: "done SSE payload: context_length=1M ✓"

    Note over UI,MD: All three paths now surface the correct 1M window
Loading

Reviews (1): Last reviewed commit: "Release v0.51.561 — Release TT (context-..." | Re-trigger Greptile

Comment thread api/streaming.py
Comment on lines +5822 to +5826
if (
_real_u and _real_u != _cc_cl_u
and _accept_u(_cc_cl_u, _real_u)
):
_resolved_real = _real_u

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Known gap: model-switch to an exactly-256k model won't self-heal during the stream. All three paths call _should_accept_session_context_length_refresh with model_changed=False (default). For a session whose compressor still holds a larger cached window (e.g. 1M) and whose active model has a genuine 256k context, the acceptance gate returns False (not (256k == 256k and 1M > 256k)False), so the stale 1M value persists in the live indicator, session-save, and done-SSE for the entire stream. The PR notes this is intentional ("err toward the LARGER window; hydration self-heals via model_changed=True"), but it means a user who switches from a 1M model to a real 256k model won't see the indicator correct until a page refresh. Worth documenting as a known limitation in the issue tracker if not already captured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant