Skip to content

feat(responses): route /v1/responses through non-OpenAI adaptors via chat completions conversion#2970

Open
0-don wants to merge 1 commit intoQuantumNous:mainfrom
0-don:feat/responses-via-chat-completions
Open

feat(responses): route /v1/responses through non-OpenAI adaptors via chat completions conversion#2970
0-don wants to merge 1 commit intoQuantumNous:mainfrom
0-don:feat/responses-via-chat-completions

Conversation

@0-don
Copy link

@0-don 0-don commented Feb 18, 2026

Summary

Routes /v1/responses requests through non-OpenAI adaptors (starting with Claude) by converting Responses → Chat Completions → Native format, then converting responses back. Reuses the existing adaptor pipeline instead of duplicating conversion logic per-adaptor.

Why not use existing PRs directly?

vs #2892 (494 lines, Claude-only)

  • Adds 250-line RequestOpenAIResponses2ClaudeMessage that reimplements all the logic already in RequestOpenAI2ClaudeMessage (max tokens, thinking, tools, instructions, tool choice)
  • DoResponsesRequest does naive SSE passthrough (scanner.Scan + fmt.Fprintf) — pipes raw Claude content_block_delta events without converting to Responses API format (response.output_text.delta, response.created, etc.). Clients expecting Responses SSE won't work
  • ResponseClaude2OpenAIResponses duplicates non-streaming conversion already handled by ClaudeHandler + ResponseClaude2OpenAI
  • Only works for Claude — no reusable converter for other adaptors
  • Adopted from feat(claude): Support calling Claude channels via /v1/responses #2892: the 2-line bug fix for tool calls being dropped when text content exists

vs #2817 (1039 lines)

  • Similar adaptor-level approach (good), but includes its own ChatToResponsesStreamState and converter
  • Our stream state machine handles additional cases: reasoning content, usage chunks, SummaryIndex field
  • Adopted from feat: enhance OpenAI responses handling and compatibility #2817: adaptor-level ConvertOpenAIResponsesRequest pattern, GetJsonType() for input detection, developersystem role mapping, PromptCacheKey/PromptCacheRetention passthrough, non-function tool passthrough via Custom field

Our approach (1006 lines, adaptor-agnostic)

  • Claude adaptor: 3 lines — chains ResponsesRequestToChatCompletionsRequestConvertOpenAIRequest (reuses all existing Claude conversion logic)
  • ChatToResponsesStreamState: reusable state machine any adaptor can use — properly emits response.created, response.output_text.delta, response.function_call_arguments.delta, response.completed etc.
  • ResponsesRequestToChatCompletionsRequest: generic converter with function_call/function_call_output merging, content type handling, response format conversion — works for any adaptor, not just Claude
  • Any future adaptor (Gemini, Deepseek, etc.) gets Responses API support by adding the same 3-line pattern

Related issues

Files changed

File Change
relay/channel/claude/adaptor.go Implement ConvertOpenAIResponsesRequest (+5 lines)
relay/channel/claude/relay-claude.go Add RelayFormatOpenAIResponses to stream/non-stream handlers
service/openaicompat/responses_to_chat.go Add converters, bug fix from #2892
service/openaicompat/chat_stream_to_responses_stream.go New: reusable stream state machine
service/openai_chat_responses_compat.go Service wrapper functions
dto/openai_response.go Add fields to ResponsesStreamResponse

Test plan

  • Streaming /v1/responses through Claude channel — verify SSE events
  • Non-streaming /v1/responses through Claude channel
  • Tool calls with text content (both appear in output)
  • Multi-turn with function_call + function_call_output in input

Summary by CodeRabbit

  • New Features
    • Added OpenAI Responses API support with full streaming integration
    • Enabled automatic conversion between Chat Completions and Responses API formats
    • Extended Claude relay compatibility to support the Responses API format

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 18, 2026

Walkthrough

Adds OpenAI Responses API support and streaming between Claude and OpenAI-compatible Responses: updates DTOs, introduces a chat->responses streaming state machine, adds bidirectional conversion functions, integrates conversion in the Claude relay adaptor/streaming handlers.

Changes

Cohort / File(s) Summary
DTO Updates
dto/openai_response.go
Modified ResponsesStreamResponse: added ResponseID, Text, Arguments; changed Part to *ResponsesOutputContent; added top-level GetOpenAIError(errorField any) *types.OpenAIError.
Claude Relay
relay/channel/claude/adaptor.go, relay/channel/claude/relay-claude.go
Added conversion of OpenAI Responses requests to chat requests in adaptor; integrated Responses streaming path in relay-claude (new handling for RelayFormatOpenAIResponses, added ResponsesStreamState field and streaming handlers).
Service wrappers
service/openai_chat_responses_compat.go
Exported wrapper functions: ResponsesRequestToChatCompletionsRequest and ChatCompletionsResponseToResponsesResponse delegating to openaicompat implementations.
Chat->Responses streaming
service/openaicompat/chat_stream_to_responses_stream.go
New ChatToResponsesStreamState type and methods to convert chat-completions stream chunks into OpenAI Responses SSE events; handles lifecycle, content parts, tool calls, indexing, and final assembly.
Bidirectional conversion logic
service/openaicompat/responses_to_chat.go
Implemented ResponsesRequestToChatCompletionsRequest, ChatCompletionsResponseToResponsesResponse, content/response-format converters and helper functions to translate between Responses API and Chat Completions formats (supports mixed content and tool-call merging).

Sequence Diagram

sequenceDiagram
    actor Client
    participant RelayAdaptor as Relay Adaptor
    participant ConversionSvc as Conversion Service
    participant Claude as Claude Channel
    participant RelayStream as Relay Stream Handler
    participant StateMgr as Responses Stream State

    Client->>RelayAdaptor: OpenAI Responses request
    RelayAdaptor->>ConversionSvc: ResponsesRequestToChatCompletionsRequest
    ConversionSvc-->>RelayAdaptor: Chat Completions request
    RelayAdaptor->>Claude: Forward chat request

    Claude->>RelayStream: Streamed chat-completions chunks
    RelayStream->>StateMgr: HandleChatChunk(chunk)
    StateMgr-->>RelayStream: []ResponsesStreamResponse events
    RelayStream-->>Client: Stream response chunks

    Note over Claude,RelayStream: Repeat for streaming chunks

    Claude->>RelayStream: Final chat response
    RelayStream->>ConversionSvc: ChatCompletionsResponseToResponsesResponse
    ConversionSvc-->>RelayStream: Final Responses payload
    RelayStream-->>Client: Final response.completed event
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • creamlike1024
  • Calcium-Ion

Poem

🐇 I hop through streams both near and far,
Turning chat to responses, bit by star,
IDs aligned and chunks set free,
Claude and OpenAI dance with me,
A rabbit's cheer for code that sings 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main objective: routing /v1/responses requests through non-OpenAI adaptors by converting to/from chat completions format.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
service/openaicompat/chat_stream_to_responses_stream.go (1)

83-96: Reasoning summary events use hardcoded indices without a corresponding output item.

The output_index: 0 and summary_index: 0 are hardcoded, and no response.output_item.added event is emitted for a reasoning item. If a text message is already at output index 0, clients may find the overlap confusing. This is acceptable as a first iteration for the compatibility layer, but consider tracking reasoning as a proper output item in a follow-up if the OpenAI Responses API spec requires it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@service/openaicompat/chat_stream_to_responses_stream.go` around lines 83 -
96, The reasoning summary currently uses hardcoded OutputIndex/SummaryIndex (0)
in the dto.ResponsesStreamResponse built from reasoningContent; instead emit a
corresponding response.output_item.added event for the reasoning item before
appending the "response.reasoning_summary_text.delta" event, compute a proper
outputIndex (e.g. nextOutputIndex based on current outputs or a counter) and use
that index for both OutputIndex and SummaryIndex when constructing the
dto.ResponsesStreamResponse, and ensure the ItemID (currently
"rs_"+strings.TrimPrefix(s.ResponseID,"resp_")) and s.ResponseID are consistent
with the added output item so clients won't see an index collision.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@relay/channel/claude/relay-claude.go`:
- Line 856: Replace the direct use of encoding/json by calling the project's
JSON wrapper: change the call that marshals responsesResp into responseData from
json.Marshal(responsesResp) to common.Marshal(responsesResp), and update imports
to remove encoding/json and import the common package (the wrapper defined in
common/json.go); keep existing error handling around the returned err and ensure
the variable names responseData and responsesResp remain unchanged.

In `@service/openaicompat/responses_to_chat.go`:
- Around line 367-381: In the switch handling media parts in
responses_to_chat.go, change the incorrect key lookups so media isn't dropped:
in the "input_file" case populate dto.MediaContent.File/FileData/FileID from
partMap["file_id"], partMap["file_url"] or partMap["file_data"] (pick available
field) and set Type dto.ContentTypeFile; for the non-standard "input_video" case
either remove it or map it defensively by checking for partMap["file_url"] or
partMap["file_id"] and only create dto.MediaContent with Type
dto.ContentTypeVideoUrl when a valid URL/ID is present; ensure you reference the
same variables used there (partMap, chatParts, dto.MediaContent,
dto.ContentTypeFile, dto.ContentTypeVideoUrl) and skip/return no media if none
of the expected keys exist.

---

Nitpick comments:
In `@service/openaicompat/chat_stream_to_responses_stream.go`:
- Around line 83-96: The reasoning summary currently uses hardcoded
OutputIndex/SummaryIndex (0) in the dto.ResponsesStreamResponse built from
reasoningContent; instead emit a corresponding response.output_item.added event
for the reasoning item before appending the
"response.reasoning_summary_text.delta" event, compute a proper outputIndex
(e.g. nextOutputIndex based on current outputs or a counter) and use that index
for both OutputIndex and SummaryIndex when constructing the
dto.ResponsesStreamResponse, and ensure the ItemID (currently
"rs_"+strings.TrimPrefix(s.ResponseID,"resp_")) and s.ResponseID are consistent
with the added output item so clients won't see an index collision.

Comment on lines 367 to 381
case "input_audio":
chatParts = append(chatParts, dto.MediaContent{
Type: dto.ContentTypeInputAudio,
InputAudio: partMap["input_audio"],
})
case "input_file":
chatParts = append(chatParts, dto.MediaContent{
Type: dto.ContentTypeFile,
File: partMap["file"],
})
case "input_video":
chatParts = append(chatParts, dto.MediaContent{
Type: dto.ContentTypeVideoUrl,
VideoUrl: partMap["video_url"],
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n service/openaicompat/responses_to_chat.go | sed -n '360,390p'

Repository: QuantumNous/new-api

Length of output: 1052


🏁 Script executed:

cat -n service/openaicompat/responses_to_chat.go | sed -n '330,400p'

Repository: QuantumNous/new-api

Length of output: 2532


🏁 Script executed:

head -50 service/openaicompat/responses_to_chat.go

Repository: QuantumNous/new-api

Length of output: 1574


🌐 Web query:

OpenAI Responses API input_audio input_file content part schema specification

💡 Result:

input_audio content part (Chat Completions API)

In a messages[].content[] array, an audio input is:

{
  "type": "input_audio",
  "input_audio": {
    "data": "<base64-encoded-audio-bytes>",
    "format": "wav"  // or "mp3"
  }
}
  • type: must be "input_audio"
  • input_audio.data: base64-encoded audio
  • input_audio.format: "wav" or "mp3"[1]

input_file content part (Responses API)

In input: [{ "role": "...", "content": [ ... ] }], a file input item is:

{
  "type": "input_file",
  "file_id": "file_..."          // OR
  // "file_url": "https://..."   // OR
  // "filename": "doc.pdf",
  // "file_data": "<base64-encoded-bytes>"
}
  • type: must be "input_file"
  • Provide one of:
    • file_id (previously uploaded file), or
    • file_url, or
    • filename + file_data (base64)[2][3]

Note: In Chat Completions, the analogous file content part is {"type":"file","file":{...}} (not input_file).[1]

Sources: [1] Chat Completions create schema (content parts, incl. input_audio, file) • [2] Responses create schema (ResponseInputFile) • [3] File inputs guide (examples using type: "input_file")


input_file and input_video content mappings read non-existent keys, silently dropping media content.

The Responses API input_file type uses file_id, file_url, or file_data (not "file"), and input_video is not a standard Responses API content type. Line 375 reads partMap["file"] which returns nil, and line 380 reads partMap["video_url"] for a non-standard type, both resulting in incomplete media content conversion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@service/openaicompat/responses_to_chat.go` around lines 367 - 381, In the
switch handling media parts in responses_to_chat.go, change the incorrect key
lookups so media isn't dropped: in the "input_file" case populate
dto.MediaContent.File/FileData/FileID from partMap["file_id"],
partMap["file_url"] or partMap["file_data"] (pick available field) and set Type
dto.ContentTypeFile; for the non-standard "input_video" case either remove it or
map it defensively by checking for partMap["file_url"] or partMap["file_id"] and
only create dto.MediaContent with Type dto.ContentTypeVideoUrl when a valid
URL/ID is present; ensure you reference the same variables used there (partMap,
chatParts, dto.MediaContent, dto.ContentTypeFile, dto.ContentTypeVideoUrl) and
skip/return no media if none of the expected keys exist.

…chat completions conversion

Enables channels that don't natively support the OpenAI Responses API
(e.g. Claude/Anthropic) to serve /v1/responses requests by converting
them to /v1/chat/completions format, routing through the existing
adaptor pipeline, and converting the response back.

Changes:
- Implement ConvertOpenAIResponsesRequest in Claude adaptor using
  Responses→Chat→Claude conversion chain (reuses existing code)
- Add ChatToResponsesStreamState for converting chat completion
  stream chunks into Responses API SSE events (handles text deltas,
  tool calls, reasoning content, usage)
- Add RelayFormatOpenAIResponses support to Claude stream/non-stream
  handlers in relay-claude.go
- Add ResponsesRequestToChatCompletionsRequest converter with:
  - GetJsonType() for robust input type detection
  - developer role → system mapping
  - PromptCacheKey/PromptCacheRetention passthrough
  - Non-function tool passthrough via Custom field
  - function_call/function_call_output → tool message merging
- Add ChatCompletionsResponseToResponsesResponse for non-streaming
- Fix: tool calls no longer dropped when text content exists in
  ResponsesResponseToChatCompletionsResponse
- Update ResponsesStreamResponse DTO with ResponseID, Text, Arguments,
  ContentIndex, Part fields

Combines and improves on approaches from:
- QuantumNous#2817 (adaptor-level conversion, converter improvements)
- QuantumNous#2892 (tool calls + text coexistence bug fix)

Related: QuantumNous#2941, QuantumNous#2043, QuantumNous#1527
@0-don 0-don force-pushed the feat/responses-via-chat-completions branch from 924c334 to 5c69030 Compare February 19, 2026 21:32
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
service/openaicompat/responses_to_chat.go (1)

383-394: default case silently produces incomplete content parts for unknown types.

For any unrecognised Responses API content part type, the current code appends a MediaContent{Type: partType} with no payload (no Text, no ImageUrl, no InputAudio, no File). When serialised this becomes {"type":"<unknown>"}, which will either confuse downstream models or be silently ignored — either way the original content is lost without any indication.

Consider skipping unknown parts instead of emitting an empty stub:

♻️ Proposed change
 		default:
-			chatParts = append(chatParts, dto.MediaContent{
-				Type: partType,
-			})
+			// Unknown Responses content type — skip rather than emit an empty stub
+			common.SysLog(fmt.Sprintf("convertResponsesContentToChat: skipping unknown part type %q", partType))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@service/openaicompat/responses_to_chat.go` around lines 383 - 394, The
default switch branch currently appends an empty dto.MediaContent{Type:
partType} which yields {"type":"<unknown>"} and loses content; change the
default to skip unknown part types (do not append to chatParts) and optionally
log a warning; update the switch in the function that builds chatParts (the code
that references chatParts, dto.MediaContent and partType) to only append when a
known payload is set and simply continue/ignore on unknown types instead of
emitting an empty stub.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@service/openaicompat/chat_stream_to_responses_stream.go`:
- Around line 83-96: The reasoning delta block uses a hardcoded outIndex = 0
which conflicts with the stream's output-index allocator; replace the literal
with a proper allocated index (use the same allocator/field used elsewhere, e.g.
call s.allocOutputIndex() or increment s.NextOutputIndex to produce a unique
index) and set that allocated value as OutputIndex for the reasoning event (keep
SummaryIndex separate). Also add a dedicated reasoning output index field to the
ChatToResponsesStreamState struct (initialise it to -1 in
NewChatToResponsesStreamState) so the reasoning index is tracked consistently
and not duplicated by ensureMessageItemEvents or other allocators.
- Around line 158-222: FinalEvents currently finalizes message and tool-call
events but never emits the paired reasoning done events; add a reasoning
finalization block in FinalEvents (similar to the "Finalize message item" block)
that checks the state fields used while streaming reasoning deltas (e.g.,
s.ReasoningText / s.ReasoningPartAdded / s.ReasoningContentAdded or whatever
names are present in the struct) and appends the corresponding
dto.ResponsesStreamResponse entries for "response.reasoning_summary_text.done"
and/or "response.reasoning_summary_part.done" (or call existing helper methods
like s.reasoningTextDoneEvent/s.reasoningPartDoneEvent if present) before
appending the message/tool completions and the final response so clients receive
matching done events for each reasoning delta.

---

Nitpick comments:
In `@service/openaicompat/responses_to_chat.go`:
- Around line 383-394: The default switch branch currently appends an empty
dto.MediaContent{Type: partType} which yields {"type":"<unknown>"} and loses
content; change the default to skip unknown part types (do not append to
chatParts) and optionally log a warning; update the switch in the function that
builds chatParts (the code that references chatParts, dto.MediaContent and
partType) to only append when a known payload is set and simply continue/ignore
on unknown types instead of emitting an empty stub.

Comment on lines +83 to +96
// Reasoning content (for models that emit reasoning_content)
reasoningContent := delta.GetReasoningContent()
if reasoningContent != "" {
outIndex := 0
summaryIndex := 0
events = append(events, dto.ResponsesStreamResponse{
Type: "response.reasoning_summary_text.delta",
ResponseID: s.ResponseID,
ItemID: "rs_" + strings.TrimPrefix(s.ResponseID, "resp_"),
OutputIndex: &outIndex,
SummaryIndex: &summaryIndex,
Delta: reasoningContent,
})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Hardcoded outIndex = 0 conflicts with the output-index allocator for reasoning content.

When a response contains both reasoning content and text, outIndex for reasoning is always 0 as a local literal, never allocated via allocOutputIndex. If ensureMessageItemEvents fires later (also allocating MessageOutputIndex = s.NextOutputIndex, which starts at 0), both the reasoning events and the message item events share OutputIndex 0, producing conflicting SSE events for clients that parse OutputIndex strictly.

The fix is to allocate a dedicated index for the reasoning item through the same NextOutputIndex counter:

💡 Proposed fix – allocate a reasoning output index
+	// Reasoning content
 	reasoningContent := delta.GetReasoningContent()
 	if reasoningContent != "" {
-		outIndex := 0
+		if s.ReasoningOutputIndex < 0 {
+			s.ReasoningOutputIndex = s.NextOutputIndex
+			s.NextOutputIndex++
+		}
+		outIndex := s.ReasoningOutputIndex
 		summaryIndex := 0
 		events = append(events, dto.ResponsesStreamResponse{

And add to the struct:

+	ReasoningOutputIndex int  // -1 until allocated

initialised to -1 in NewChatToResponsesStreamState.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@service/openaicompat/chat_stream_to_responses_stream.go` around lines 83 -
96, The reasoning delta block uses a hardcoded outIndex = 0 which conflicts with
the stream's output-index allocator; replace the literal with a proper allocated
index (use the same allocator/field used elsewhere, e.g. call
s.allocOutputIndex() or increment s.NextOutputIndex to produce a unique index)
and set that allocated value as OutputIndex for the reasoning event (keep
SummaryIndex separate). Also add a dedicated reasoning output index field to the
ChatToResponsesStreamState struct (initialise it to -1 in
NewChatToResponsesStreamState) so the reasoning index is tracked consistently
and not duplicated by ensureMessageItemEvents or other allocators.

Comment on lines +158 to +222
// FinalEvents emits the closing events: content done, tool calls done, and
// response.completed.
func (s *ChatToResponsesStreamState) FinalEvents(usage *dto.Usage) []dto.ResponsesStreamResponse {
events := s.baseEvents()

// Finalize message item
if s.MessageItemAdded {
text := s.OutputText.String()
if s.MessageContentAdded {
events = append(events, s.outputTextDoneEvent(text))
events = append(events, s.contentPartDoneEvent(text))
}
events = append(events, s.messageItemDoneEvent(text))
}

// Finalize tool calls
for _, callID := range s.ToolCallOrder {
outIndex := s.outputIndexPtr(callID)
args := s.ToolCallArgs[callID]
if args != "" {
events = append(events, dto.ResponsesStreamResponse{
Type: "response.function_call_arguments.done",
ResponseID: s.ResponseID,
ItemID: callID,
OutputIndex: outIndex,
Arguments: args,
})
}
events = append(events, dto.ResponsesStreamResponse{
Type: "response.output_item.done",
ResponseID: s.ResponseID,
ItemID: callID,
OutputIndex: outIndex,
Item: &dto.ResponsesOutput{
Type: "function_call",
ID: callID,
Status: "completed",
CallId: callID,
Name: s.ToolCallName[callID],
Arguments: args,
},
})
}

// Build final output and usage
output := s.buildFinalOutput()
finalUsage := s.buildFinalUsage(usage)

resp := &dto.OpenAIResponsesResponse{
ID: s.ResponseID,
Object: "response",
CreatedAt: int(s.CreatedAt),
Status: "completed",
Model: s.Model,
Output: output,
Usage: finalUsage,
}
events = append(events, dto.ResponsesStreamResponse{
Type: "response.completed",
ResponseID: s.ResponseID,
Response: resp,
})

return events
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

FinalEvents emits no done event for reasoning content.

When reasoning deltas were emitted during streaming, FinalEvents finalises text and tool calls but never emits response.reasoning_summary_text.done (or response.reasoning_summary_part.done). Clients that wait for a paired done event for every delta will be left with an incomplete SSE sequence.

💡 Suggested addition in FinalEvents
+	// Finalize reasoning content
+	if s.ReasoningOutputIndex >= 0 {
+		summaryIndex := 0
+		events = append(events, dto.ResponsesStreamResponse{
+			Type:         "response.reasoning_summary_text.done",
+			ResponseID:   s.ResponseID,
+			ItemID:       "rs_" + strings.TrimPrefix(s.ResponseID, "resp_"),
+			OutputIndex:  func() *int { i := s.ReasoningOutputIndex; return &i }(),
+			SummaryIndex: &summaryIndex,
+			Text:         s.ReasoningText.String(),  // accumulate in HandleChatChunk
+		})
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@service/openaicompat/chat_stream_to_responses_stream.go` around lines 158 -
222, FinalEvents currently finalizes message and tool-call events but never
emits the paired reasoning done events; add a reasoning finalization block in
FinalEvents (similar to the "Finalize message item" block) that checks the state
fields used while streaming reasoning deltas (e.g., s.ReasoningText /
s.ReasoningPartAdded / s.ReasoningContentAdded or whatever names are present in
the struct) and appends the corresponding dto.ResponsesStreamResponse entries
for "response.reasoning_summary_text.done" and/or
"response.reasoning_summary_part.done" (or call existing helper methods like
s.reasoningTextDoneEvent/s.reasoningPartDoneEvent if present) before appending
the message/tool completions and the final response so clients receive matching
done events for each reasoning delta.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments