feat(thinking): unify Claude adaptive reasoning behavior#1534
feat(thinking): unify Claude adaptive reasoning behavior#1534ben-vargas wants to merge 2 commits intorouter-for-me:devfrom
Conversation
Summary of ChangesHello @ben-vargas, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the system's support for Claude Opus 4.6's adaptive reasoning capabilities. It enables the full pipeline to correctly originate and process adaptive thinking configurations, moving beyond the previous translator-only support. The changes ensure that newer Claude models can leverage their advanced thinking modes while maintaining full backward compatibility for older models and existing configurations, improving the accuracy and flexibility of thinking parameter application. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively unifies the behavior for Claude's adaptive reasoning, introducing support for the thinking.type: "adaptive" and output_config.effort API. The changes are well-structured, splitting the logic into adaptive and legacy paths, and include backward compatibility for xhigh level aliasing. The test coverage is comprehensive, validating the new adaptive behavior and ensuring no regressions for older models. I've added a few minor suggestions to improve code conciseness by using sjson.DeleteManyBytes where multiple fields are being deleted. Overall, this is a solid implementation.
|
@luispater and @hkfires - Feel free to take this as an idea PR and implement your own way. Just something I needed to do in my fork in order to get adaptive thinking working right rather than legacy budget_tokens being sent and thought I'd share it. GPT 5.3 Codex generated. |
As OpenAI, Anthropic, and Gemini have all shifted to using levels to adjust thinking intensity, we're currently researching how to make the thinking implementation more unified. I'm making some modifications based on your PR, and will merge them once complete. Thank you very much! |
Problem
Claude Opus 4.6 Thinking introduced a new API contract for thinking configuration:
thinking.type: "adaptive"paired withoutput_config.effort, replacing the legacythinking.type: "enabled"+thinking.budget_tokensshape used by older models (Sonnet 4.5, Opus 4.5).Partial support existed at the translator layer (
938a799): when a client sent a raw request already containingthinking.type: "adaptive", the translators could recognize it and convert it to other provider formats (e.g. Gemini'sthinkingLevel, OpenAI'sreasoning_effort), or pass it through for Claude-to-Claude. However, the applier -- the component responsible for producing the thinking shape from model suffixes and validated configs -- still only emitted the legacy format:{ "thinking": { "type": "enabled", "budget_tokens": 16384 } }This meant that when a user configured thinking via a model suffix (e.g.
model-name(high)ormodel-name(auto)), the pipeline (suffix parsing → validation → applier) always produced the legacy budget-based shape, even for Opus 4.6 which expects the adaptive format. The system could forward an adaptive request but could never originate one.What this PR does
New
AdaptiveAllowedcapability flag onThinkingSupportin the model registry, along with aLevelslist for Opus 4.6 (low,medium,high,max). This lets the applier distinguish adaptive-capable models from legacy budget-only models.Splits the Claude applier into two code paths:
applyAdaptive-- for models withAdaptiveAllowed: true. Produces the correct upstream shape:{ "thinking": { "type": "adaptive" }, "output_config": { "effort": "high" } }budget_tokensfield is always stripped.applyLegacy-- preserves the existingthinking.type: "enabled"+budget_tokensbehavior for all pre-Opus-4.6 models. Zero behavioral change for Sonnet 4.5, Opus 4.5, etc.Backward-compatible level aliasing --
xhigh(used by some existing configurations and the translator layer) is transparently normalized tomaxfor adaptive models that don't definexhighin their level set, so existing configs continue to work.Validation-layer updates --
ValidateConfignow runs adaptive-aware alias normalization before level validation, preventing false rejection of valid inputs. The logger includes the neweffortfield for observability.Test coverage -- new test cases verify adaptive body output (effort present,
budget_tokensabsent), disabled/auto modes on adaptive models,xhigh-to-maxaliasing, and cross-provider conversion involving adaptive models. A newclaude-adaptive-modeltest fixture andabsentFieldsassertion helper support negative-field checks.Impact
adaptive+effortshape for Opus 4.6, closing the gap left by the translator-only support.(high),(low)) are passed through semantically as effort levels instead of being lossy-converted to arbitrary token budgets.