feat(responses): route /v1/responses through non-OpenAI adaptors via chat completions conversion#2970
Conversation
WalkthroughAdds OpenAI Responses API support and streaming between Claude and OpenAI-compatible Responses: updates DTOs, introduces a chat->responses streaming state machine, adds bidirectional conversion functions, integrates conversion in the Claude relay adaptor/streaming handlers. Changes
Sequence DiagramsequenceDiagram
actor Client
participant RelayAdaptor as Relay Adaptor
participant ConversionSvc as Conversion Service
participant Claude as Claude Channel
participant RelayStream as Relay Stream Handler
participant StateMgr as Responses Stream State
Client->>RelayAdaptor: OpenAI Responses request
RelayAdaptor->>ConversionSvc: ResponsesRequestToChatCompletionsRequest
ConversionSvc-->>RelayAdaptor: Chat Completions request
RelayAdaptor->>Claude: Forward chat request
Claude->>RelayStream: Streamed chat-completions chunks
RelayStream->>StateMgr: HandleChatChunk(chunk)
StateMgr-->>RelayStream: []ResponsesStreamResponse events
RelayStream-->>Client: Stream response chunks
Note over Claude,RelayStream: Repeat for streaming chunks
Claude->>RelayStream: Final chat response
RelayStream->>ConversionSvc: ChatCompletionsResponseToResponsesResponse
ConversionSvc-->>RelayStream: Final Responses payload
RelayStream-->>Client: Final response.completed event
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
service/openaicompat/chat_stream_to_responses_stream.go (1)
83-96: Reasoning summary events use hardcoded indices without a corresponding output item.The
output_index: 0andsummary_index: 0are hardcoded, and noresponse.output_item.addedevent is emitted for a reasoning item. If a text message is already at output index 0, clients may find the overlap confusing. This is acceptable as a first iteration for the compatibility layer, but consider tracking reasoning as a proper output item in a follow-up if the OpenAI Responses API spec requires it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@service/openaicompat/chat_stream_to_responses_stream.go` around lines 83 - 96, The reasoning summary currently uses hardcoded OutputIndex/SummaryIndex (0) in the dto.ResponsesStreamResponse built from reasoningContent; instead emit a corresponding response.output_item.added event for the reasoning item before appending the "response.reasoning_summary_text.delta" event, compute a proper outputIndex (e.g. nextOutputIndex based on current outputs or a counter) and use that index for both OutputIndex and SummaryIndex when constructing the dto.ResponsesStreamResponse, and ensure the ItemID (currently "rs_"+strings.TrimPrefix(s.ResponseID,"resp_")) and s.ResponseID are consistent with the added output item so clients won't see an index collision.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@relay/channel/claude/relay-claude.go`:
- Line 856: Replace the direct use of encoding/json by calling the project's
JSON wrapper: change the call that marshals responsesResp into responseData from
json.Marshal(responsesResp) to common.Marshal(responsesResp), and update imports
to remove encoding/json and import the common package (the wrapper defined in
common/json.go); keep existing error handling around the returned err and ensure
the variable names responseData and responsesResp remain unchanged.
In `@service/openaicompat/responses_to_chat.go`:
- Around line 367-381: In the switch handling media parts in
responses_to_chat.go, change the incorrect key lookups so media isn't dropped:
in the "input_file" case populate dto.MediaContent.File/FileData/FileID from
partMap["file_id"], partMap["file_url"] or partMap["file_data"] (pick available
field) and set Type dto.ContentTypeFile; for the non-standard "input_video" case
either remove it or map it defensively by checking for partMap["file_url"] or
partMap["file_id"] and only create dto.MediaContent with Type
dto.ContentTypeVideoUrl when a valid URL/ID is present; ensure you reference the
same variables used there (partMap, chatParts, dto.MediaContent,
dto.ContentTypeFile, dto.ContentTypeVideoUrl) and skip/return no media if none
of the expected keys exist.
---
Nitpick comments:
In `@service/openaicompat/chat_stream_to_responses_stream.go`:
- Around line 83-96: The reasoning summary currently uses hardcoded
OutputIndex/SummaryIndex (0) in the dto.ResponsesStreamResponse built from
reasoningContent; instead emit a corresponding response.output_item.added event
for the reasoning item before appending the
"response.reasoning_summary_text.delta" event, compute a proper outputIndex
(e.g. nextOutputIndex based on current outputs or a counter) and use that index
for both OutputIndex and SummaryIndex when constructing the
dto.ResponsesStreamResponse, and ensure the ItemID (currently
"rs_"+strings.TrimPrefix(s.ResponseID,"resp_")) and s.ResponseID are consistent
with the added output item so clients won't see an index collision.
| case "input_audio": | ||
| chatParts = append(chatParts, dto.MediaContent{ | ||
| Type: dto.ContentTypeInputAudio, | ||
| InputAudio: partMap["input_audio"], | ||
| }) | ||
| case "input_file": | ||
| chatParts = append(chatParts, dto.MediaContent{ | ||
| Type: dto.ContentTypeFile, | ||
| File: partMap["file"], | ||
| }) | ||
| case "input_video": | ||
| chatParts = append(chatParts, dto.MediaContent{ | ||
| Type: dto.ContentTypeVideoUrl, | ||
| VideoUrl: partMap["video_url"], | ||
| }) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n service/openaicompat/responses_to_chat.go | sed -n '360,390p'Repository: QuantumNous/new-api
Length of output: 1052
🏁 Script executed:
cat -n service/openaicompat/responses_to_chat.go | sed -n '330,400p'Repository: QuantumNous/new-api
Length of output: 2532
🏁 Script executed:
head -50 service/openaicompat/responses_to_chat.goRepository: QuantumNous/new-api
Length of output: 1574
🌐 Web query:
OpenAI Responses API input_audio input_file content part schema specification
💡 Result:
input_audio content part (Chat Completions API)
In a messages[].content[] array, an audio input is:
{
"type": "input_audio",
"input_audio": {
"data": "<base64-encoded-audio-bytes>",
"format": "wav" // or "mp3"
}
}type: must be"input_audio"input_audio.data: base64-encoded audioinput_audio.format:"wav"or"mp3"[1]
input_file content part (Responses API)
In input: [{ "role": "...", "content": [ ... ] }], a file input item is:
{
"type": "input_file",
"file_id": "file_..." // OR
// "file_url": "https://..." // OR
// "filename": "doc.pdf",
// "file_data": "<base64-encoded-bytes>"
}type: must be"input_file"- Provide one of:
file_id(previously uploaded file), orfile_url, orfilename+file_data(base64)[2][3]
Note: In Chat Completions, the analogous file content part is {"type":"file","file":{...}} (not input_file).[1]
Sources: [1] Chat Completions create schema (content parts, incl. input_audio, file) • [2] Responses create schema (ResponseInputFile) • [3] File inputs guide (examples using type: "input_file")
input_file and input_video content mappings read non-existent keys, silently dropping media content.
The Responses API input_file type uses file_id, file_url, or file_data (not "file"), and input_video is not a standard Responses API content type. Line 375 reads partMap["file"] which returns nil, and line 380 reads partMap["video_url"] for a non-standard type, both resulting in incomplete media content conversion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@service/openaicompat/responses_to_chat.go` around lines 367 - 381, In the
switch handling media parts in responses_to_chat.go, change the incorrect key
lookups so media isn't dropped: in the "input_file" case populate
dto.MediaContent.File/FileData/FileID from partMap["file_id"],
partMap["file_url"] or partMap["file_data"] (pick available field) and set Type
dto.ContentTypeFile; for the non-standard "input_video" case either remove it or
map it defensively by checking for partMap["file_url"] or partMap["file_id"] and
only create dto.MediaContent with Type dto.ContentTypeVideoUrl when a valid
URL/ID is present; ensure you reference the same variables used there (partMap,
chatParts, dto.MediaContent, dto.ContentTypeFile, dto.ContentTypeVideoUrl) and
skip/return no media if none of the expected keys exist.
…chat completions conversion Enables channels that don't natively support the OpenAI Responses API (e.g. Claude/Anthropic) to serve /v1/responses requests by converting them to /v1/chat/completions format, routing through the existing adaptor pipeline, and converting the response back. Changes: - Implement ConvertOpenAIResponsesRequest in Claude adaptor using Responses→Chat→Claude conversion chain (reuses existing code) - Add ChatToResponsesStreamState for converting chat completion stream chunks into Responses API SSE events (handles text deltas, tool calls, reasoning content, usage) - Add RelayFormatOpenAIResponses support to Claude stream/non-stream handlers in relay-claude.go - Add ResponsesRequestToChatCompletionsRequest converter with: - GetJsonType() for robust input type detection - developer role → system mapping - PromptCacheKey/PromptCacheRetention passthrough - Non-function tool passthrough via Custom field - function_call/function_call_output → tool message merging - Add ChatCompletionsResponseToResponsesResponse for non-streaming - Fix: tool calls no longer dropped when text content exists in ResponsesResponseToChatCompletionsResponse - Update ResponsesStreamResponse DTO with ResponseID, Text, Arguments, ContentIndex, Part fields Combines and improves on approaches from: - QuantumNous#2817 (adaptor-level conversion, converter improvements) - QuantumNous#2892 (tool calls + text coexistence bug fix) Related: QuantumNous#2941, QuantumNous#2043, QuantumNous#1527
924c334 to
5c69030
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
service/openaicompat/responses_to_chat.go (1)
383-394:defaultcase silently produces incomplete content parts for unknown types.For any unrecognised Responses API content part type, the current code appends a
MediaContent{Type: partType}with no payload (noText, noImageUrl, noInputAudio, noFile). When serialised this becomes{"type":"<unknown>"}, which will either confuse downstream models or be silently ignored — either way the original content is lost without any indication.Consider skipping unknown parts instead of emitting an empty stub:
♻️ Proposed change
default: - chatParts = append(chatParts, dto.MediaContent{ - Type: partType, - }) + // Unknown Responses content type — skip rather than emit an empty stub + common.SysLog(fmt.Sprintf("convertResponsesContentToChat: skipping unknown part type %q", partType))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@service/openaicompat/responses_to_chat.go` around lines 383 - 394, The default switch branch currently appends an empty dto.MediaContent{Type: partType} which yields {"type":"<unknown>"} and loses content; change the default to skip unknown part types (do not append to chatParts) and optionally log a warning; update the switch in the function that builds chatParts (the code that references chatParts, dto.MediaContent and partType) to only append when a known payload is set and simply continue/ignore on unknown types instead of emitting an empty stub.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@service/openaicompat/chat_stream_to_responses_stream.go`:
- Around line 83-96: The reasoning delta block uses a hardcoded outIndex = 0
which conflicts with the stream's output-index allocator; replace the literal
with a proper allocated index (use the same allocator/field used elsewhere, e.g.
call s.allocOutputIndex() or increment s.NextOutputIndex to produce a unique
index) and set that allocated value as OutputIndex for the reasoning event (keep
SummaryIndex separate). Also add a dedicated reasoning output index field to the
ChatToResponsesStreamState struct (initialise it to -1 in
NewChatToResponsesStreamState) so the reasoning index is tracked consistently
and not duplicated by ensureMessageItemEvents or other allocators.
- Around line 158-222: FinalEvents currently finalizes message and tool-call
events but never emits the paired reasoning done events; add a reasoning
finalization block in FinalEvents (similar to the "Finalize message item" block)
that checks the state fields used while streaming reasoning deltas (e.g.,
s.ReasoningText / s.ReasoningPartAdded / s.ReasoningContentAdded or whatever
names are present in the struct) and appends the corresponding
dto.ResponsesStreamResponse entries for "response.reasoning_summary_text.done"
and/or "response.reasoning_summary_part.done" (or call existing helper methods
like s.reasoningTextDoneEvent/s.reasoningPartDoneEvent if present) before
appending the message/tool completions and the final response so clients receive
matching done events for each reasoning delta.
---
Nitpick comments:
In `@service/openaicompat/responses_to_chat.go`:
- Around line 383-394: The default switch branch currently appends an empty
dto.MediaContent{Type: partType} which yields {"type":"<unknown>"} and loses
content; change the default to skip unknown part types (do not append to
chatParts) and optionally log a warning; update the switch in the function that
builds chatParts (the code that references chatParts, dto.MediaContent and
partType) to only append when a known payload is set and simply continue/ignore
on unknown types instead of emitting an empty stub.
| // Reasoning content (for models that emit reasoning_content) | ||
| reasoningContent := delta.GetReasoningContent() | ||
| if reasoningContent != "" { | ||
| outIndex := 0 | ||
| summaryIndex := 0 | ||
| events = append(events, dto.ResponsesStreamResponse{ | ||
| Type: "response.reasoning_summary_text.delta", | ||
| ResponseID: s.ResponseID, | ||
| ItemID: "rs_" + strings.TrimPrefix(s.ResponseID, "resp_"), | ||
| OutputIndex: &outIndex, | ||
| SummaryIndex: &summaryIndex, | ||
| Delta: reasoningContent, | ||
| }) | ||
| } |
There was a problem hiding this comment.
Hardcoded outIndex = 0 conflicts with the output-index allocator for reasoning content.
When a response contains both reasoning content and text, outIndex for reasoning is always 0 as a local literal, never allocated via allocOutputIndex. If ensureMessageItemEvents fires later (also allocating MessageOutputIndex = s.NextOutputIndex, which starts at 0), both the reasoning events and the message item events share OutputIndex 0, producing conflicting SSE events for clients that parse OutputIndex strictly.
The fix is to allocate a dedicated index for the reasoning item through the same NextOutputIndex counter:
💡 Proposed fix – allocate a reasoning output index
+ // Reasoning content
reasoningContent := delta.GetReasoningContent()
if reasoningContent != "" {
- outIndex := 0
+ if s.ReasoningOutputIndex < 0 {
+ s.ReasoningOutputIndex = s.NextOutputIndex
+ s.NextOutputIndex++
+ }
+ outIndex := s.ReasoningOutputIndex
summaryIndex := 0
events = append(events, dto.ResponsesStreamResponse{And add to the struct:
+ ReasoningOutputIndex int // -1 until allocatedinitialised to -1 in NewChatToResponsesStreamState.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@service/openaicompat/chat_stream_to_responses_stream.go` around lines 83 -
96, The reasoning delta block uses a hardcoded outIndex = 0 which conflicts with
the stream's output-index allocator; replace the literal with a proper allocated
index (use the same allocator/field used elsewhere, e.g. call
s.allocOutputIndex() or increment s.NextOutputIndex to produce a unique index)
and set that allocated value as OutputIndex for the reasoning event (keep
SummaryIndex separate). Also add a dedicated reasoning output index field to the
ChatToResponsesStreamState struct (initialise it to -1 in
NewChatToResponsesStreamState) so the reasoning index is tracked consistently
and not duplicated by ensureMessageItemEvents or other allocators.
| // FinalEvents emits the closing events: content done, tool calls done, and | ||
| // response.completed. | ||
| func (s *ChatToResponsesStreamState) FinalEvents(usage *dto.Usage) []dto.ResponsesStreamResponse { | ||
| events := s.baseEvents() | ||
|
|
||
| // Finalize message item | ||
| if s.MessageItemAdded { | ||
| text := s.OutputText.String() | ||
| if s.MessageContentAdded { | ||
| events = append(events, s.outputTextDoneEvent(text)) | ||
| events = append(events, s.contentPartDoneEvent(text)) | ||
| } | ||
| events = append(events, s.messageItemDoneEvent(text)) | ||
| } | ||
|
|
||
| // Finalize tool calls | ||
| for _, callID := range s.ToolCallOrder { | ||
| outIndex := s.outputIndexPtr(callID) | ||
| args := s.ToolCallArgs[callID] | ||
| if args != "" { | ||
| events = append(events, dto.ResponsesStreamResponse{ | ||
| Type: "response.function_call_arguments.done", | ||
| ResponseID: s.ResponseID, | ||
| ItemID: callID, | ||
| OutputIndex: outIndex, | ||
| Arguments: args, | ||
| }) | ||
| } | ||
| events = append(events, dto.ResponsesStreamResponse{ | ||
| Type: "response.output_item.done", | ||
| ResponseID: s.ResponseID, | ||
| ItemID: callID, | ||
| OutputIndex: outIndex, | ||
| Item: &dto.ResponsesOutput{ | ||
| Type: "function_call", | ||
| ID: callID, | ||
| Status: "completed", | ||
| CallId: callID, | ||
| Name: s.ToolCallName[callID], | ||
| Arguments: args, | ||
| }, | ||
| }) | ||
| } | ||
|
|
||
| // Build final output and usage | ||
| output := s.buildFinalOutput() | ||
| finalUsage := s.buildFinalUsage(usage) | ||
|
|
||
| resp := &dto.OpenAIResponsesResponse{ | ||
| ID: s.ResponseID, | ||
| Object: "response", | ||
| CreatedAt: int(s.CreatedAt), | ||
| Status: "completed", | ||
| Model: s.Model, | ||
| Output: output, | ||
| Usage: finalUsage, | ||
| } | ||
| events = append(events, dto.ResponsesStreamResponse{ | ||
| Type: "response.completed", | ||
| ResponseID: s.ResponseID, | ||
| Response: resp, | ||
| }) | ||
|
|
||
| return events | ||
| } |
There was a problem hiding this comment.
FinalEvents emits no done event for reasoning content.
When reasoning deltas were emitted during streaming, FinalEvents finalises text and tool calls but never emits response.reasoning_summary_text.done (or response.reasoning_summary_part.done). Clients that wait for a paired done event for every delta will be left with an incomplete SSE sequence.
💡 Suggested addition in FinalEvents
+ // Finalize reasoning content
+ if s.ReasoningOutputIndex >= 0 {
+ summaryIndex := 0
+ events = append(events, dto.ResponsesStreamResponse{
+ Type: "response.reasoning_summary_text.done",
+ ResponseID: s.ResponseID,
+ ItemID: "rs_" + strings.TrimPrefix(s.ResponseID, "resp_"),
+ OutputIndex: func() *int { i := s.ReasoningOutputIndex; return &i }(),
+ SummaryIndex: &summaryIndex,
+ Text: s.ReasoningText.String(), // accumulate in HandleChatChunk
+ })
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@service/openaicompat/chat_stream_to_responses_stream.go` around lines 158 -
222, FinalEvents currently finalizes message and tool-call events but never
emits the paired reasoning done events; add a reasoning finalization block in
FinalEvents (similar to the "Finalize message item" block) that checks the state
fields used while streaming reasoning deltas (e.g., s.ReasoningText /
s.ReasoningPartAdded / s.ReasoningContentAdded or whatever names are present in
the struct) and appends the corresponding dto.ResponsesStreamResponse entries
for "response.reasoning_summary_text.done" and/or
"response.reasoning_summary_part.done" (or call existing helper methods like
s.reasoningTextDoneEvent/s.reasoningPartDoneEvent if present) before appending
the message/tool completions and the final response so clients receive matching
done events for each reasoning delta.
Summary
Routes
/v1/responsesrequests through non-OpenAI adaptors (starting with Claude) by converting Responses → Chat Completions → Native format, then converting responses back. Reuses the existing adaptor pipeline instead of duplicating conversion logic per-adaptor.Why not use existing PRs directly?
vs #2892 (494 lines, Claude-only)
RequestOpenAIResponses2ClaudeMessagethat reimplements all the logic already inRequestOpenAI2ClaudeMessage(max tokens, thinking, tools, instructions, tool choice)DoResponsesRequestdoes naive SSE passthrough (scanner.Scan+fmt.Fprintf) — pipes raw Claudecontent_block_deltaevents without converting to Responses API format (response.output_text.delta,response.created, etc.). Clients expecting Responses SSE won't workResponseClaude2OpenAIResponsesduplicates non-streaming conversion already handled byClaudeHandler+ResponseClaude2OpenAIvs #2817 (1039 lines)
ChatToResponsesStreamStateand converterSummaryIndexfieldConvertOpenAIResponsesRequestpattern,GetJsonType()for input detection,developer→systemrole mapping,PromptCacheKey/PromptCacheRetentionpassthrough, non-function tool passthrough viaCustomfieldOur approach (1006 lines, adaptor-agnostic)
ResponsesRequestToChatCompletionsRequest→ConvertOpenAIRequest(reuses all existing Claude conversion logic)ChatToResponsesStreamState: reusable state machine any adaptor can use — properly emitsresponse.created,response.output_text.delta,response.function_call_arguments.delta,response.completedetc.ResponsesRequestToChatCompletionsRequest: generic converter with function_call/function_call_output merging, content type handling, response format conversion — works for any adaptor, not just ClaudeRelated issues
/v1/responses→/v1/chat/completionsconversion for unsupported channels (this implements it)/v1/responsesFiles changed
relay/channel/claude/adaptor.goConvertOpenAIResponsesRequest(+5 lines)relay/channel/claude/relay-claude.goRelayFormatOpenAIResponsesto stream/non-stream handlersservice/openaicompat/responses_to_chat.goservice/openaicompat/chat_stream_to_responses_stream.goservice/openai_chat_responses_compat.godto/openai_response.goResponsesStreamResponseTest plan
/v1/responsesthrough Claude channel — verify SSE events/v1/responsesthrough Claude channelSummary by CodeRabbit