fix(server): plain-text call:verb spans must survive emit_finish malformed-parse + responses .done#1
Conversation
…ormed-parse + responses .done Commit acf718b ("detect call:verb{} in streaming emitter") added Pattern B to find_tool_start() so plain-text `call:<verb>{` openers route into StreamMode::TOOL_BUFFER, the same path that XML envelopes (<tool_call>, <function=, <tool_code>) use. Two downstream behaviors in emit_finish() were never updated for the plain-text case, surfacing as 2 unit-test failures once CI re-enabled the server_unit suite: - test_emitter_content_mode_malformed_call_dropped: an unbalanced `call:get_weather{location: "unclosed` should remain visible in accumulated_text() once parsing fails. - test_emitter_content_mode_responses_done_uses_pre_strip_text: the Responses-format `.done` payload (response.output_text.done / content_part.done / completed) must include the raw `call:` text so streaming-client buffers agree with the server's final claim. Pattern A vs Pattern B distinction ================================== find_tool_start() now reports `is_plain_text` (out-param). The CONTENT state machine records it on the TOOL_BUFFER transition as `tool_open_is_plain_text_`. At emit_finish(): Pattern A (XML envelope opener) - malformed parse → drop buffer (protocol artifact, not prose; test_emitter_does_not_leak_malformed_tool_xml stays green). - responses_streamed_text excludes tool_buffer_. Pattern B (plain-text `call:` opener) - malformed parse → flush tool_buffer_ back to accumulated_content_ AND emit it as a content delta. The literal `call:foo{...` span stays caller-visible as a signal that the model produced garbage. - responses_streamed_text = accumulated_content_ + tool_buffer_ so the `.done` payload carries the raw call span the model emitted. Intentional divergence: accumulated_text() vs responses_streamed_text ===================================================================== For a successful Pattern B hoist these two accessors now diverge by design: accumulated_text() → stripped (call span replaced by cleaned_text). Consumed by OpenAI Chat / Anthropic non-streaming response builders that would otherwise duplicate the call as both literal text AND a tool_use block. responses_streamed_text → pre-strip (includes the raw `call:` text). Used only inside the RESPONSES branch of emit_finish for the .done / .completed events so streaming clients' accumulated `.delta` buffer matches the server's final payload. test_emitter_content_mode_strips_ call_span_from_accumulated_text continues to require the stripped form from accumulated_text(). Files: sse_emitter.h adds the `tool_open_is_plain_text_` member with a docstring; sse_emitter.cpp extends find_tool_start()'s signature, sets the flag at the TOOL_BUFFER transition, splits the malformed-parse branch on the flag, and conditionally appends tool_buffer_ to responses_streamed_text. Test results: server_unit 2001/2001 (was 1999/2001).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c85c3b5034
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (mode_ == StreamMode::TOOL_BUFFER && tool_open_is_plain_text_) { | ||
| responses_streamed_text += tool_buffer_; |
There was a problem hiding this comment.
Avoid adding unstreamed tool buffer to Responses text
When a plain-text call: opener is detected before it has been flushed past the holdback (for example Looking up: + call:get_weather{...}), the emitter switches to TOOL_BUFFER, so the raw call span is never sent as response.output_text.delta; on successful parse, only parsed.cleaned_text is emitted as content before the final events. Appending tool_buffer_ here therefore makes response.output_text.done / response.completed.output_text include text that streaming clients never received in deltas, recreating the mismatch this snapshot is meant to avoid for the TOOL_BUFFER path.
Useful? React with 👍 / 👎.
Summary
Two
test_server_unitfailures (PR Luce-Org#285'sfeat/lucebox-docker) traced back to commit acf718b ("detect call:verb{} in streaming emitter"). That commit added Pattern B tofind_tool_start()so plain-textcall:<verb>{openers route intoStreamMode::TOOL_BUFFER, but two downstream behaviors inemit_finish()were never updated for the plain-text case. The failures surfaced once CI re-enabled the suite — pre-existing feat bug, not a recent-merge regression.Failing tests:
test_emitter_content_mode_malformed_call_dropped(test_server_unit.cpp:1029)test_emitter_content_mode_responses_done_uses_pre_strip_text(test_server_unit.cpp:1157)Pattern A vs Pattern B distinction
find_tool_start()now reportsis_plain_text(out-param). The CONTENT state machine records it on the TOOL_BUFFER transition astool_open_is_plain_text_. Atemit_finish():<tool_call>/<function=/<tool_code>):test_emitter_does_not_leak_malformed_tool_xmlstays green).responses_streamed_textexcludestool_buffer_.call:<verb>{opener):tool_buffer_back toaccumulated_content_AND emit it as a content delta. The literal span stays caller-visible.responses_streamed_text = accumulated_content_ + tool_buffer_so.donepayloads carry the raw call span.Intentional accessor divergence
For a successful Pattern B hoist, two accessors now diverge by design:
accumulated_text()→ stripped (call span replaced by cleaned_text). Used by OpenAI Chat / Anthropic non-streaming response builders that would otherwise duplicate the call as both literal text AND atool_useblock.test_emitter_content_mode_strips_call_span_from_accumulated_textrequires this stripped form.responses_streamed_text(local toemit_finish) → pre-strip (includes the rawcall:text). Used only inside the RESPONSES branch for.done/.completedevents so streaming clients' accumulated.deltabuffer matches the server's final payload.Files changed
server/src/server/sse_emitter.h: addstool_open_is_plain_text_member (+15 lines, mostly docstring)server/src/server/sse_emitter.cpp: extendsfind_tool_start()signature, sets the flag at the TOOL_BUFFER transition, splits the malformed-parse branch on the flag, conditionally appendstool_buffer_toresponses_streamed_text(+68/-16)Test plan
./test_server_unitreports 2001 assertions / 2 failures onfeat/lucebox-docker(e63f4e0)../test_server_unitreports 2001 assertions / 0 failures.test_emitter_does_not_leak_malformed_tool_xmlpasses.accumulated_text():test_emitter_content_mode_strips_call_span_from_accumulated_textpasses.test_emitter_content_mode_malformed_call_droppedpasses..donecarries raw call span:test_emitter_content_mode_responses_done_uses_pre_strip_textpasses.Origin of regression: acf718b.
Built with
lucebox-hub:build-envcontainer (matches PR Luce-Org#329/Luce-Org#331/Luce-Org#326 pattern); CPU-only — no GPU needed to exercise the host-side emitter.