Skip to content

fix(server): plain-text call:verb spans must survive emit_finish malformed-parse + responses .done#1

Merged
easel merged 1 commit into
feat/lucebox-dockerfrom
fix/pr285-emitter-call-verb-tests
Jun 3, 2026
Merged

fix(server): plain-text call:verb spans must survive emit_finish malformed-parse + responses .done#1
easel merged 1 commit into
feat/lucebox-dockerfrom
fix/pr285-emitter-call-verb-tests

Conversation

@easel
Copy link
Copy Markdown
Owner

@easel easel commented Jun 3, 2026

Summary

Two test_server_unit failures (PR Luce-Org#285's feat/lucebox-docker) traced back to commit acf718b ("detect call:verb{} in streaming emitter"). That commit added Pattern B to find_tool_start() so plain-text call:<verb>{ openers route into StreamMode::TOOL_BUFFER, but two downstream behaviors in emit_finish() were never updated for the plain-text case. The failures surfaced once CI re-enabled the suite — pre-existing feat bug, not a recent-merge regression.

Failing tests:

  • test_emitter_content_mode_malformed_call_dropped (test_server_unit.cpp:1029)
  • test_emitter_content_mode_responses_done_uses_pre_strip_text (test_server_unit.cpp:1157)

Pattern A vs Pattern B distinction

find_tool_start() now reports is_plain_text (out-param). The CONTENT state machine records it on the TOOL_BUFFER transition as tool_open_is_plain_text_. At emit_finish():

  • Pattern A (XML envelope: <tool_call> / <function= / <tool_code>):
    • malformed parse → drop buffer (protocol artifact, not prose; test_emitter_does_not_leak_malformed_tool_xml stays green).
    • responses_streamed_text excludes tool_buffer_.
  • Pattern B (plain-text call:<verb>{ opener):
    • malformed parse → flush tool_buffer_ back to accumulated_content_ AND emit it as a content delta. The literal span stays caller-visible.
    • responses_streamed_text = accumulated_content_ + tool_buffer_ so .done payloads carry the raw call span.

Intentional accessor divergence

For a successful Pattern B hoist, two accessors now diverge by design:

  • accumulated_text()stripped (call span replaced by cleaned_text). Used by OpenAI Chat / Anthropic non-streaming response builders that would otherwise duplicate the call as both literal text AND a tool_use block. test_emitter_content_mode_strips_call_span_from_accumulated_text requires this stripped form.
  • responses_streamed_text (local to emit_finish) → pre-strip (includes the raw call: text). Used only inside the RESPONSES branch for .done / .completed events so streaming clients' accumulated .delta buffer matches the server's final payload.

Files changed

  • server/src/server/sse_emitter.h: adds tool_open_is_plain_text_ member (+15 lines, mostly docstring)
  • server/src/server/sse_emitter.cpp: extends find_tool_start() signature, sets the flag at the TOOL_BUFFER transition, splits the malformed-parse branch on the flag, conditionally appends tool_buffer_ to responses_streamed_text (+68/-16)

Test plan

  • Reproduce: ./test_server_unit reports 2001 assertions / 2 failures on feat/lucebox-docker (e63f4e0).
  • After fix: ./test_server_unit reports 2001 assertions / 0 failures.
  • Pattern A malformed XML still suppressed: test_emitter_does_not_leak_malformed_tool_xml passes.
  • Pattern B success path still strips accumulated_text(): test_emitter_content_mode_strips_call_span_from_accumulated_text passes.
  • Pattern B malformed flush: test_emitter_content_mode_malformed_call_dropped passes.
  • Responses .done carries raw call span: test_emitter_content_mode_responses_done_uses_pre_strip_text passes.

Origin of regression: acf718b.

Built with lucebox-hub:build-env container (matches PR Luce-Org#329/Luce-Org#331/Luce-Org#326 pattern); CPU-only — no GPU needed to exercise the host-side emitter.

…ormed-parse + responses .done

Commit acf718b ("detect call:verb{} in streaming emitter") added Pattern B
to find_tool_start() so plain-text `call:<verb>{` openers route into
StreamMode::TOOL_BUFFER, the same path that XML envelopes (<tool_call>,
<function=, <tool_code>) use. Two downstream behaviors in emit_finish()
were never updated for the plain-text case, surfacing as 2 unit-test
failures once CI re-enabled the server_unit suite:

  - test_emitter_content_mode_malformed_call_dropped: an unbalanced
    `call:get_weather{location: "unclosed` should remain visible in
    accumulated_text() once parsing fails.
  - test_emitter_content_mode_responses_done_uses_pre_strip_text: the
    Responses-format `.done` payload (response.output_text.done /
    content_part.done / completed) must include the raw `call:` text
    so streaming-client buffers agree with the server's final claim.

Pattern A vs Pattern B distinction
==================================
find_tool_start() now reports `is_plain_text` (out-param). The CONTENT
state machine records it on the TOOL_BUFFER transition as
`tool_open_is_plain_text_`. At emit_finish():

  Pattern A (XML envelope opener)
    - malformed parse → drop buffer (protocol artifact, not prose;
      test_emitter_does_not_leak_malformed_tool_xml stays green).
    - responses_streamed_text excludes tool_buffer_.
  Pattern B (plain-text `call:` opener)
    - malformed parse → flush tool_buffer_ back to accumulated_content_
      AND emit it as a content delta. The literal `call:foo{...` span
      stays caller-visible as a signal that the model produced garbage.
    - responses_streamed_text = accumulated_content_ + tool_buffer_
      so the `.done` payload carries the raw call span the model
      emitted.

Intentional divergence: accumulated_text() vs responses_streamed_text
=====================================================================
For a successful Pattern B hoist these two accessors now diverge by
design:

  accumulated_text()        → stripped (call span replaced by
                              cleaned_text). Consumed by OpenAI Chat /
                              Anthropic non-streaming response builders
                              that would otherwise duplicate the call
                              as both literal text AND a tool_use block.
  responses_streamed_text   → pre-strip (includes the raw `call:` text).
                              Used only inside the RESPONSES branch of
                              emit_finish for the .done / .completed
                              events so streaming clients' accumulated
                              `.delta` buffer matches the server's final
                              payload. test_emitter_content_mode_strips_
                              call_span_from_accumulated_text continues
                              to require the stripped form from
                              accumulated_text().

Files: sse_emitter.h adds the `tool_open_is_plain_text_` member with a
docstring; sse_emitter.cpp extends find_tool_start()'s signature,
sets the flag at the TOOL_BUFFER transition, splits the malformed-parse
branch on the flag, and conditionally appends tool_buffer_ to
responses_streamed_text.

Test results: server_unit 2001/2001 (was 1999/2001).
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c85c3b5034

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +632 to +633
if (mode_ == StreamMode::TOOL_BUFFER && tool_open_is_plain_text_) {
responses_streamed_text += tool_buffer_;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid adding unstreamed tool buffer to Responses text

When a plain-text call: opener is detected before it has been flushed past the holdback (for example Looking up: + call:get_weather{...}), the emitter switches to TOOL_BUFFER, so the raw call span is never sent as response.output_text.delta; on successful parse, only parsed.cleaned_text is emitted as content before the final events. Appending tool_buffer_ here therefore makes response.output_text.done / response.completed.output_text include text that streaming clients never received in deltas, recreating the mismatch this snapshot is meant to avoid for the TOOL_BUFFER path.

Useful? React with 👍 / 👎.

@easel easel merged commit 15a7e03 into feat/lucebox-docker Jun 3, 2026
1 check passed
@easel easel deleted the fix/pr285-emitter-call-verb-tests branch June 3, 2026 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant