fix(#2083): inject thinking-disable into title gen API calls for local reasoning models (#2083)#3944
fix(#2083): inject thinking-disable into title gen API calls for local reasoning models (#2083)#3944rodboev wants to merge 5 commits into
Conversation
|
| Filename | Overview |
|---|---|
| api/streaming.py | Adds thinking/reasoning-disable injection to both the agent and aux title-generation paths using a hostname + model-capability heuristic; inlines the same _route_ok logic twice rather than sharing a helper. |
| tests/test_issue2083_gpu_title_gen.py | New regression test suite covering skip-logic, per-route injection presence/absence, Minimax compatibility, and LM Studio name-heuristic paths; relies on MagicMock agents which bypasses real attribute resolution. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Title generation triggered] --> B{API mode?}
B -- codex_responses --> C[Codex path - unchanged]
B -- anthropic --> D[Anthropic path - unchanged]
B -- OpenAI-compat else --> E[Build api_kwargs]
E --> F{_is_minimax?}
F -- yes --> G[set thinking disabled\nset reasoning disabled\nset reasoning_split True]
F -- no --> H{_caps non-empty\nAND _route_ok?}
H -- yes --> I[set thinking disabled\nset reasoning disabled]
H -- no --> J{_name_heuristic\n_route_ok AND model name\nmatches reasoning pattern?}
J -- yes --> I
J -- no --> K[No extra_body injection]
G --> L[api_kwargs extra_body updated]
I --> L
K --> M[Call completions.create]
L --> M
M --> N{Response has content?}
N -- yes --> O[Return title]
N -- no, llm_empty_reasoning --> P[Skip remaining attempts\nfall back to local title]
Reviews (4): Last reviewed commit: "Fall back to model-name heuristic for lo..." | Re-trigger Greptile
|
Thanks @rodboev — the fix correctly targets the right gap (background title generation pegging local reasoning models), and the The issue: the injection in Bypassing that gate means on a strict cloud provider every title attempt 400s, the exception is swallowed at debug level, and titles silently degrade to the heuristic fallback. Chat itself is unaffected (not a brick), but it trades the local-GPU fix for a silent title regression for direct-cloud users — and additionally sends the nonstandard Requested fix (small, ~5-10 lines): gate the injection so it only fires when reasoning extra_body is actually supported for the route. The cleanest option is to reuse the agent's own gate, e.g.: _tg_supports = False
try:
_tg_supports = bool(agent._supports_reasoning_extra_body())
except Exception:
_tg_supports = False
if _tg_supports:
_tg_extra = dict(api_kwargs.get('extra_body') or {})
_tg_extra.setdefault('thinking', {'type': 'disabled'})
_tg_extra.setdefault('reasoning', {'enabled': False})
api_kwargs['extra_body'] = _tg_extraThat keeps the GPU fix for exactly the reasoning-capable local routes #2083 is about (LM Studio Qwen3/DeepSeek-R1 resolve The MiniMax |
…ty check Non-reasoning providers (Mistral direct, plain OpenAI) would 400 on the unconditional thinking/reasoning extra_body keys injected during title generation. Gate behind resolve_model_reasoning_efforts() and treat _is_minimax_route() as an independent reasoning signal.
|
The title-gen path was unconditionally injecting Fixed in
All 7 issue-specific tests + 41 reasoning/model-resolver tests pass locally. |
|
Thanks @rodboev — the model-capability gating via The remaining issue: the gate is model-capability-based but route-blind
So vs master (which sent no extra keys on this path), this now sends Requested fix (one line, both paths)AND the capability gate with the agent's own route-tolerance gate, so the injection only fires for routes known to accept the keys (which still covers the local LM Studio / llama.cpp / Ollama case #2083 targets): _route_ok = (
callable(getattr(agent, "_supports_reasoning_extra_body", None))
and agent._supports_reasoning_extra_body()
) or _is_local_base_url(_agent_base_url) # or your preferred local-host check
if (_is_minimax or resolve_model_reasoning_efforts(_agent_model, provider_id=_agent_provider, base_url=_agent_base_url)) and _route_ok:
_tg_extra.setdefault("thinking", {"type": "disabled"})
_tg_extra.setdefault("reasoning", {"enabled": False})
...(The aux path has the same shape — gate it the same way. This is essentially the route-gate my original CR pointed at; the capability check is a good addition on top, not a replacement for it.) Please re-confirm the existing #2083 test still passes and add a case asserting NO injection for a reasoning model on an OpenAI-direct route. Happy to take it straight to release once that's in — the rest is solid. 🙏 |
|
Both title-gen paths now AND the model-capability check with route tolerance:
|
|
Thanks @rodboev — the route-tolerance gating is exactly right and resolves the OpenAI-direct 400 regression cleanly (verified: The new gate suppresses the disable payload for LM Studio reasoning models — the exact #2083 caseThe injection now requires So on the aux path, Requested fix (both aux + agent gates)When the route is tolerant/local (LM Studio, localhost/127.0.0.1, llama.cpp), fall back to a model-name reasoning heuristic when _caps = resolve_model_reasoning_efforts(model, provider_id=provider, base_url=base_url)
_local_reasoning_name = _route_ok and not _caps and _looks_like_reasoning_model(model) # name heuristic
if _is_minimax or (_caps and _route_ok) or _local_reasoning_name:
...inject...Apply it symmetrically to both Really close now — this is the last edge. Reopen for re-gate when it's in. 🙏 |
…solve_model_reasoning_efforts returns empty
|
Checked the commit pushed after my last round ( The LM Studio regression is fixed — confirmedThe name-heuristic fallback closes the #2083 headline case. For But the agent path abandoned the canonical gate it had one commit agoMy 22:17 review endorsed _route_ok = any(h in _agent_base_lower for h in ('openrouter', 'nousresearch.com', 'localhost', '127.0.0.1', '0.0.0.0')) or (_agent_provider or '').strip().lower() == 'lmstudio'The agent's real gate (
Recommendation: keep the agent gate on the agent path, drop only the over-restrictive conjunctionThe original 400-regression I flagged at 21:53 came from the _route_ok = (
callable(getattr(agent, '_supports_reasoning_extra_body', None))
and agent._supports_reasoning_extra_body()
)
if _is_minimax or _route_ok:
_tg_extra.setdefault('thinking', {'type': 'disabled'})
_tg_extra.setdefault('reasoning', {'enabled': False})That recovers GitHub coverage and the LM Studio live probe without re-introducing the OpenAI-direct 400, and keeps WebUI from drifting against the agent's gate over time. The name-heuristic fallback is still the right call for the aux path ( Functionally the PR is correct and shippable as-is; this is a "use the gate you already have" maintainability note, not a blocker. |
Thinking Path
generate_title_raw_via_auxalready injectsextra_body={"reasoning": {"enabled": False}}when callingcall_llm, but thegenerate_title_raw_via_agentOpenAI-compatelse:branch had no equivalent injection, so the model runs a full reasoning pass for a one-line title.llm_empty_reasoningresponses but never addressed the root cause: the reasoning pass itself. Injectingthinking: {type: disabled}andreasoning: {enabled: False}viaextra_bodybefore the API call suppresses the reasoning pass at the endpoint level.setdefaultpreserves any existing per-providerextra_bodyentries, including the Minimaxreasoning_split: Truewhich now adds to the same dict instead of replacing it.What Changed
api/streaming.py: ingenerate_title_raw_via_agent, injectedextra_bodywiththinking: {type: disabled}andreasoning: {enabled: False}intoapi_kwargsbefore the Minimax block in the OpenAI-compat path; simplified the Minimax block to appendreasoning_splitto the same dictWhy It Matters
Local reasoning models no longer burn GPU time on a hidden thinking pass for background title generation, eliminating the post-prompt GPU spike that users reported.
Verification
Risks / Follow-ups
thinkingandreasoningkeys are OpenAI-compat conventions supported by LM Studio, llama.cpp, and vLLM; endpoints that don't recognize them silently ignore unknownextra_bodykeys, so this is backward-compatible.extra_body.thinkingfor a different purpose, thesetdefaultguard prevents overwriting their value.Model Used
Claude Opus 4.6 via Claude Code CLI