fix(provider): tolerate duplicate reasoning_content key from NVIDIA compat endpoint (TAURI-RUST-85R)#4227
Conversation
…ompat endpoint (TAURI-RUST-85R) NVIDIA's integrate.api.nvidia.com OpenAI-compatible endpoint returns the `reasoning_content` key twice in a single `message` object for some thinking models (e.g. stepfun-ai/step-3.7-flash). The derived `Shadow` deserializer used by ResponseMessage and StreamDelta strict-rejected the repeated key with `duplicate field reasoning_content`, dropping the entire completion. Replace both Shadow structs with hand-folded `visit_map` visitors that consume map entries manually, so a repeated key no longer errors (last value wins, standard JSON object semantics). The tinyhumansai#3547 behaviour is preserved: the canonical `reasoning_content` still wins over the `reasoning` alias. Applies to both the buffered and SSE streaming paths. Adds regression tests covering a doubled `reasoning_content` on the buffered and streaming paths, plus the doubled-canonical-beats-alias case. Fixes tinyhumansai#4204
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughThis PR updates OpenAI-compatible deserialization for ChangesDuplicate reasoning_content handling
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/openhuman/inference/provider/compatible_types.rs (1)
547-606: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚖️ Poor tradeoffModule exceeds the ~500-line size guideline.
compatible_types.rsnow runs past 600 lines, and the two near-identical hand-folded visitors (ResponseMessageVisitor/StreamDeltaVisitor) add to it while duplicating the last-wins +reasoning_content-over-reasoningprecedence logic that must stay in sync. Consider splitting this file along the canonical module shape and/or factoring the shared map-folding precedence logic into a small helper ormacro_rules!to keep both deserializers from drifting.As per coding guidelines: "Rust modules must be ≤ ~500 lines in size".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/inference/provider/compatible_types.rs` around lines 547 - 606, The issue is that compatible_types.rs has grown beyond the module size guideline and duplicates the same hand-folded deserialization precedence logic in ResponseMessageVisitor and StreamDeltaVisitor. Split the module into smaller canonical pieces and factor the shared “last value wins” plus reasoning_content-over-reasoning merge behavior into a common helper or small macro, then use that shared logic from both deserializers so they stay consistent and the file size drops below the limit.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/openhuman/inference/provider/compatible_types.rs`:
- Around line 547-606: The issue is that compatible_types.rs has grown beyond
the module size guideline and duplicates the same hand-folded deserialization
precedence logic in ResponseMessageVisitor and StreamDeltaVisitor. Split the
module into smaller canonical pieces and factor the shared “last value wins”
plus reasoning_content-over-reasoning merge behavior into a common helper or
small macro, then use that shared logic from both deserializers so they stay
consistent and the file size drops below the limit.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7978e785-6979-4343-83ea-d38fede6dd56
📒 Files selected for processing (2)
src/openhuman/inference/provider/compatible_tests.rssrc/openhuman/inference/provider/compatible_types.rs
…hanges Ran `cargo fmt --all` over the two files touched by this PR: collapse the short match arms in the ResponseMessage visitor to single lines and rewrap the test's serde_json::from_str(...).expect(...). Pure formatting, no logic change. Refs tinyhumansai#4204
|
Closing as a duplicate of #4207, which fixes #4204 (TAURI-RUST-85R) with the same map-fold Visitor approach and predates this PR — and additionally handles the last-non-null-wins edge case. Thanks @M3gA-Mind! |
Summary
NVIDIA's
integrate.api.nvidia.comOpenAI-compatible endpoint returns thereasoning_contentkey twice in a singlemessageobject for some thinkingmodels (e.g.
stepfun-ai/step-3.7-flash).ResponseMessageandStreamDeltadeserialize through an inner
Shadowstruct using serde's derived deserializer,which strict-rejects a repeated key with
duplicate field reasoning_contentanddrops the entire completion (Sentry TAURI-RUST-85R — 2,037 events / 5 users, still
firing on 0.57.18).
#[serde(default)]does not relax duplicate-key rejection, and#3547 only fixed the distinct-name collision (
reasoning+reasoning_content),not two identical keys.
Fix
Replace the
Shadowstructs in bothResponseMessage::deserializeandStreamDelta::deserializewith hand-writtenvisit_mapvisitors that consume mapentries manually. Repeated keys no longer error (last value wins — standard JSON
object semantics), and the canonical
reasoning_contentstill wins over thereasoningalias (#3547 preserved). Applies to both the buffered and SSE paths.Tests
duplicate_reasoning_content_in_response_message_does_not_errorduplicate_reasoning_content_in_stream_delta_does_not_errorduplicate_reasoning_content_still_beats_reasoning_aliasExisting #3547 tests continue to pass.
Fixes #4204
Summary by CodeRabbit
reasoning_content, using the last occurrence.reasoningis used whenreasoning_contentis absent.reasoning_contentin both non-streaming and streaming responses, including precedence whenreasoningand duplicatedreasoning_contentare both present.