fix(server): plain-text call:verb spans must survive emit_finish malformed-parse + responses .done by easel · Pull Request #1 · easel/lucebox-hub

easel · 2026-06-03T17:51:16Z

Summary

Two test_server_unit failures (PR Luce-Org#285's feat/lucebox-docker) traced back to commit acf718b ("detect call:verb{} in streaming emitter"). That commit added Pattern B to find_tool_start() so plain-text call:<verb>{ openers route into StreamMode::TOOL_BUFFER, but two downstream behaviors in emit_finish() were never updated for the plain-text case. The failures surfaced once CI re-enabled the suite — pre-existing feat bug, not a recent-merge regression.

Failing tests:

test_emitter_content_mode_malformed_call_dropped (test_server_unit.cpp:1029)
test_emitter_content_mode_responses_done_uses_pre_strip_text (test_server_unit.cpp:1157)

Pattern A vs Pattern B distinction

find_tool_start() now reports is_plain_text (out-param). The CONTENT state machine records it on the TOOL_BUFFER transition as tool_open_is_plain_text_. At emit_finish():

Pattern A (XML envelope: <tool_call> / <function= / <tool_code>):
- malformed parse → drop buffer (protocol artifact, not prose; test_emitter_does_not_leak_malformed_tool_xml stays green).
- responses_streamed_text excludes tool_buffer_.
Pattern B (plain-text call:<verb>{ opener):
- malformed parse → flush tool_buffer_ back to accumulated_content_ AND emit it as a content delta. The literal span stays caller-visible.
- responses_streamed_text = accumulated_content_ + tool_buffer_ so .done payloads carry the raw call span.

Intentional accessor divergence

For a successful Pattern B hoist, two accessors now diverge by design:

accumulated_text() → stripped (call span replaced by cleaned_text). Used by OpenAI Chat / Anthropic non-streaming response builders that would otherwise duplicate the call as both literal text AND a tool_use block. test_emitter_content_mode_strips_call_span_from_accumulated_text requires this stripped form.
responses_streamed_text (local to emit_finish) → pre-strip (includes the raw call: text). Used only inside the RESPONSES branch for .done / .completed events so streaming clients' accumulated .delta buffer matches the server's final payload.

Files changed

server/src/server/sse_emitter.h: adds tool_open_is_plain_text_ member (+15 lines, mostly docstring)
server/src/server/sse_emitter.cpp: extends find_tool_start() signature, sets the flag at the TOOL_BUFFER transition, splits the malformed-parse branch on the flag, conditionally appends tool_buffer_ to responses_streamed_text (+68/-16)

Test plan

Reproduce: ./test_server_unit reports 2001 assertions / 2 failures on feat/lucebox-docker (e63f4e0).
After fix: ./test_server_unit reports 2001 assertions / 0 failures.
Pattern A malformed XML still suppressed: test_emitter_does_not_leak_malformed_tool_xml passes.
Pattern B success path still strips accumulated_text(): test_emitter_content_mode_strips_call_span_from_accumulated_text passes.
Pattern B malformed flush: test_emitter_content_mode_malformed_call_dropped passes.
Responses .done carries raw call span: test_emitter_content_mode_responses_done_uses_pre_strip_text passes.

Origin of regression: acf718b.

Built with lucebox-hub:build-env container (matches PR Luce-Org#329/Luce-Org#331/Luce-Org#326 pattern); CPU-only — no GPU needed to exercise the host-side emitter.

…ormed-parse + responses .done Commit acf718b ("detect call:verb{} in streaming emitter") added Pattern B to find_tool_start() so plain-text `call:<verb>{` openers route into StreamMode::TOOL_BUFFER, the same path that XML envelopes (<tool_call>, <function=, <tool_code>) use. Two downstream behaviors in emit_finish() were never updated for the plain-text case, surfacing as 2 unit-test failures once CI re-enabled the server_unit suite: - test_emitter_content_mode_malformed_call_dropped: an unbalanced `call:get_weather{location: "unclosed` should remain visible in accumulated_text() once parsing fails. - test_emitter_content_mode_responses_done_uses_pre_strip_text: the Responses-format `.done` payload (response.output_text.done / content_part.done / completed) must include the raw `call:` text so streaming-client buffers agree with the server's final claim. Pattern A vs Pattern B distinction ================================== find_tool_start() now reports `is_plain_text` (out-param). The CONTENT state machine records it on the TOOL_BUFFER transition as `tool_open_is_plain_text_`. At emit_finish(): Pattern A (XML envelope opener) - malformed parse → drop buffer (protocol artifact, not prose; test_emitter_does_not_leak_malformed_tool_xml stays green). - responses_streamed_text excludes tool_buffer_. Pattern B (plain-text `call:` opener) - malformed parse → flush tool_buffer_ back to accumulated_content_ AND emit it as a content delta. The literal `call:foo{...` span stays caller-visible as a signal that the model produced garbage. - responses_streamed_text = accumulated_content_ + tool_buffer_ so the `.done` payload carries the raw call span the model emitted. Intentional divergence: accumulated_text() vs responses_streamed_text ===================================================================== For a successful Pattern B hoist these two accessors now diverge by design: accumulated_text() → stripped (call span replaced by cleaned_text). Consumed by OpenAI Chat / Anthropic non-streaming response builders that would otherwise duplicate the call as both literal text AND a tool_use block. responses_streamed_text → pre-strip (includes the raw `call:` text). Used only inside the RESPONSES branch of emit_finish for the .done / .completed events so streaming clients' accumulated `.delta` buffer matches the server's final payload. test_emitter_content_mode_strips_ call_span_from_accumulated_text continues to require the stripped form from accumulated_text(). Files: sse_emitter.h adds the `tool_open_is_plain_text_` member with a docstring; sse_emitter.cpp extends find_tool_start()'s signature, sets the flag at the TOOL_BUFFER transition, splits the malformed-parse branch on the flag, and conditionally appends tool_buffer_ to responses_streamed_text. Test results: server_unit 2001/2001 (was 1999/2001).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c85c3b5034

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-03T17:54:24Z

+    if (mode_ == StreamMode::TOOL_BUFFER && tool_open_is_plain_text_) {
+        responses_streamed_text += tool_buffer_;


Avoid adding unstreamed tool buffer to Responses text

When a plain-text call: opener is detected before it has been flushed past the holdback (for example Looking up: + call:get_weather{...}), the emitter switches to TOOL_BUFFER, so the raw call span is never sent as response.output_text.delta; on successful parse, only parsed.cleaned_text is emitted as content before the final events. Appending tool_buffer_ here therefore makes response.output_text.done / response.completed.output_text include text that streaming clients never received in deltas, recreating the mismatch this snapshot is meant to avoid for the TOOL_BUFFER path.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed Jun 3, 2026

View reviewed changes

easel merged commit 15a7e03 into feat/lucebox-docker Jun 3, 2026
1 check passed

easel deleted the fix/pr285-emitter-call-verb-tests branch June 3, 2026 19:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(server): plain-text call:verb spans must survive emit_finish malformed-parse + responses .done#1

fix(server): plain-text call:verb spans must survive emit_finish malformed-parse + responses .done#1
easel merged 1 commit into
feat/lucebox-dockerfrom
fix/pr285-emitter-call-verb-tests

easel commented Jun 3, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if (mode_ == StreamMode::TOOL_BUFFER && tool_open_is_plain_text_) {
		responses_streamed_text += tool_buffer_;

Conversation

easel commented Jun 3, 2026

Summary

Pattern A vs Pattern B distinction

Intentional accessor divergence

Files changed

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant