fix(server): plain-text call:<verb>{} tool-call detection (parser + emitter wiring) by easel · Pull Request #329 · Luce-Org/lucebox-hub

easel · 2026-06-01T13:42:20Z

Summary

Closes the loop on Gemma-4-style plain-text tool emissions reaching /v1/messages clients as proper Anthropic tool_use content blocks. Combines the parser delta from the earlier #323 (which was closed and folded into PR #285) with the previously-missing emitter wiring discovered on 2026-06-01.

Both layers are required for the fix to work end-to-end:

Parser (cherry-picked from fix(server): parse gemma's call:<verb>{} plain-text tool emissions #323 + the underscore-prefix follow-up): adds pattern 6 — call:<ns>?<verb>{relaxed-JSON args} — to tool_parser.cpp so plain-text Gemma emissions like call:get_country_info{country: "France"} or _call:... (SentencePiece tokenizer artifact) parse to structured ToolCalls.
Emitter wiring: the SSE emitter only entered TOOL_BUFFER mode when the model emitted the literal <tool_call> XML opener. For plain-text emissions it stayed in CONTENT mode and never invoked parse_tool_calls. Added a CONTENT-mode finalize branch that runs the parser on accumulated text when a call: substring is present, hoists matches into tool_calls_, strips the spans from accumulated_text, and flips finish_reason to tool_calls so the existing Anthropic/OpenAI serialization paths emit tool_use content blocks.

Empirical signal

Live smoke test on 2026-05-31 against lucebox-hub:fac7e0f-cuda12 (easel/feat/lucebox-docker tip, which had the parser merged but not yet the emitter wiring):

POST /v1/messages with tools=[get_country_info] →
  stop_reason: "end_turn"
  content[0]: {"type": "text", "text": "_call:get_country_info{country: \"France\"}..."}

Expected (and after this fix):

  stop_reason: "tool_use"
  content[N]: {"type": "tool_use", "name": "get_country_info", "input": {"country": "France"}}

The original PR #323 didn't catch this because its tests called parse_tool_calls directly rather than going through the emitter; the parser was correct but unreachable in the live path.

History

fix(server): parse gemma's call:<verb>{} plain-text tool emissions #323 (closed, folded into feat(lucebox): docker stack + CLI + bench/profile + harness + luce-bench in-tree #285): the parser layer landed on feat/lucebox-docker.
This PR brings BOTH layers to main together — they're inseparable for functional correctness, and the emitter wiring on its own has no parser to invoke against Luce-Org/main.
feat(lucebox): docker stack + CLI + bench/profile + harness + luce-bench in-tree #285 (feat/lucebox-docker) carries the equivalent changes via separate commits; whichever lands first wins and the other auto-resolves.

Out of scope

Gemma4 / Laguna soft-close ports (separate work in PR feat(server): soft-close thinking termination via logit-ratio peek #326)
Docker image rebuild + live-service e2e validation (reviewer / operator, post-merge)
Streaming SSE: scope decision documented in the plan doc; non-streaming path is the priority

Files

Plan: docs/experiments/sse-emitter-content-mode-tool-parse-plan.md (includes codex review verbatim)
Plan (parser): docs/experiments/server-call-verb-tool-parser-plan.md (from fix(server): parse gemma's call:<verb>{} plain-text tool emissions #323 cherry-pick)
Parser: server/src/server/tool_parser.{cpp,h}
Emitter wiring: server/src/server/sse_emitter.cpp
Tests: server/test/test_server_unit.cpp (+14 parser, +9 emitter)

Tests

23 new test cases across parser + emitter layers. All passing in a standalone driver build (the sub-agent's CMake build was interrupted by harness timeout before completing; reviewer should confirm via cmake --build server/build --target test_server_unit && server/build/test_server_unit).

🤖 Generated with Claude Code

Plan + Codex review for running parse_tool_calls on accumulated CONTENT-mode text so plain-text `call:<verb>{...}` invocations (Gemma4) actually produce tool_use blocks instead of stop=end_turn. Codex verdict: REVISE (residue hazard) → integrated as a new emitter- level test guarding accumulated_text() span strip. Q4 rebuttal: tool_allowed enforcement is already inside parse_tool_calls.

Captures the diagnosis (gemma forge 0/30 on 2026-05-30), the proposed sixth detection pattern, the relaxed-JSON arg parser sketch, the unit-test matrix, and codex's review (which forced reordering the new pattern to slot Luce-Org#5 ahead of the bare-JSON sweep to avoid interception of nested name/arguments-shaped args). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a sixth detection pattern to `parse_tool_calls` that recognizes the plain-text tool invocations gemma emits in chat-completion content (`call:get_country_info{country: "France"}` / `call:execute-bead:read-file{path: "..."}` / etc). The 2026-05-30 gemma full bench scored forge 0/30 because every row's output carried these `call:<verb>{...}` invocations as text rather than structured `tool_use` content blocks. None of the existing five envelope-shaped detectors (`<tool_call>`, `<function=...>`, `<tool_code>`, bare JSON) match the bare `call:` shape. The new pattern: - Anchors on a sentinel character (whitespace, comma, semicolon, open/close bracket, etc.) before `call:` so narrative usages like `narrative.call:foo` don't match. - Supports namespaced verbs (`execute-bead:read-file`, `default_api:fetch_sales_data`) and strips the namespace before using the verb as the ToolCall name. - Extracts the args block via a quote- and escape-aware balanced-brace scanner that tolerates `"`, `'`, and `` ` `` string literals and tracks `[]` depth alongside `{}`. - Parses the args as strict JSON first, then falls back to a relaxed rewrite that quotes bare identifier keys and normalizes single/ backtick quoted strings to double-quoted before retrying. Malformed args drop the single invocation without crashing or polluting other calls. - Runs *before* the bare-JSON sweep so that inner args of the form `call:outer{"name": "inner", "arguments": {}}` aren't hijacked into a spurious `inner` ToolCall by pattern Luce-Org#6. Downstream the existing wiring takes over: SseEmitter::accumulate already calls parse_tool_calls; a non-empty ToolCall list flips finish_reason to `tool_calls`, which the Anthropic /v1/messages branch maps to `stop_reason="tool_use"` with `tool_use` content blocks (http_server.cpp:2030-2090) and the OpenAI branch maps to `choices[].message.tool_calls`. The forge client-side workaround `_parse_plain_text_tool_calls` shipping on feat/lucebox-docker (commit deba2fd) becomes redundant once a server with this fix is deployed. It stays in place as defense-in-depth for older deployed servers. Test plan: 14 new C++ unit cases in test_server_unit.cpp covering single / back-to-back / namespaced / snake- and kebab-case verbs; tool-allowed filtering; mid-prose rejection vs. whitespace-led acceptance; malformed args drop; inner `{}` inside string literals; strict-JSON and relaxed-keys arg parsing; cleaned_text scrubbing; the codex-requested inner `name`/`arguments` interception case; and multi-line nested-array args mirroring the snapshot data. All pass in a standalone driver. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Smoke-testing the post-PR-Luce-Org#323 image (lucebox-hub:cuda12 @ 8039911) on sindri's gemma-4-26b revealed a new emission pattern: the model sometimes outputs ``_call:get_country_info{...}`` with a leading underscore. This is a SentencePiece / chat-template tokenizer artifact that became visible after bragi's channel-token routing fix (commit 4b757d1) — the underscore is residual from the tokenizer's internal serialization that earlier handling stripped. Both parsers missed these invocations: * Server-side (tool_parser.cpp:182): the sentinel character class ``[\s,;:\(\[\{\}\)\]\>]`` did not include ``_``. Added. * Client-side (forge.py:32): ``\bcall:`` requires a word boundary before ``call``, but ``_`` is a word char so ``\b`` doesn't fire between ``_`` and ``c``. Replaced with explicit lookbehind on the same sentinel set (including ``_``). Net result: ``_call:foo{...}`` now parses to a tool_use the same way ``call:foo{...}`` does. Tradeoff: ``my_call:foo{}`` mid-identifier would also match, but real model outputs don't emit free-form ``my_call:`` text (tool names come from request tool defs). Tests: +2 cases in test_forge_grader.py (underscore alone, mixed back-to-back with both prefixed and bare). 16 → 18 forge_grader tests, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR Luce-Org#323's parser added pattern 6 (call:<verb>{...}) but the SSE emitter only invokes parse_tool_calls when mode_ == TOOL_BUFFER, which fires only on the literal <tool_call> XML opener. For models like gemma-4 that emit tool calls as plain text, the emitter stays in CONTENT mode and the parser is never called, so no tool_use blocks land in the response (finish_reason="stop" / stop_reason="end_turn"). Add a CONTENT-mode finalize branch that runs parse_tool_calls when the accumulated text contains a plausible `call:<verb>{` opener (checked via a cheap O(N) substring scan to avoid regex cost on no-tool responses). Matches are hoisted into tool_calls_, the covering spans are stripped from accumulated_text, and finish_reason flips to "tool_calls" so the existing Anthropic/OpenAI serialization paths emit proper tool_use content blocks. Pre-check accepts `_call:foo{` (SentencePiece underscore artifact) since `find("call:")` lands inside the `_call:` window — full validation is delegated to parse_tool_calls (tool_parser.cpp). Tests: +9 unit cases covering parsed/skipped/underscore/no-substring/ multi-call/malformed/no-double-fire-on-XML/empty-tools/preserving prior content paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cubic-dev-ai

3 issues found across 6 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="server/src/server/tool_parser.cpp">

<violation number="1" location="server/src/server/tool_parser.cpp:248">
P2: String rewrite misses escaping inner double quotes. Relaxed JSON like single-quoted text with `"` inside can fail parse and silently lose a tool call. Escape `"` when normalizing non-double-quoted strings.</violation>
</file>

<file name="server/src/server/sse_emitter.cpp">

<violation number="1" location="server/src/server/sse_emitter.cpp:49">
P2: Pre-check blocks digit-start verbs. Parser accepts them, so valid plain-text calls can be skipped. Allow digits in first verb char.</violation>

<violation number="2" location="server/src/server/sse_emitter.cpp:703">
P2: Stripping accumulated_content_ here can desync Responses stream. Deltas already sent raw call text, but done/completed now send stripped text.</violation>
</file>

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

cubic-dev-ai · 2026-06-01T13:50:48Z

+        if (in_str) {
+            // Inside a string we already opened. Mirror escapes verbatim.
+            if (c == '\\' && i + 1 < payload.size()) {
+                rewritten += c;


P2: String rewrite misses escaping inner double quotes. Relaxed JSON like single-quoted text with " inside can fail parse and silently lose a tool call. Escape " when normalizing non-double-quoted strings.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At server/src/server/tool_parser.cpp, line 248: <comment>String rewrite misses escaping inner double quotes. Relaxed JSON like single-quoted text with `"` inside can fail parse and silently lose a tool call. Escape `"` when normalizing non-double-quoted strings.</comment> <file context> @@ -161,6 +168,134 @@ static const std::regex & re_tool_code() { + if (in_str) { + // Inside a string we already opened. Mirror escapes verbatim. + if (c == '\\' && i + 1 < payload.size()) { + rewritten += c; + rewritten += payload[i + 1]; + i += 2; </file context>

Suggested change

rewritten += c;

if (in_str != '"' && c == '"') {

rewritten += "\\\"";

} else {

rewritten += c;

}

cubic-dev-ai · 2026-06-01T13:50:49Z

+    while ((pos = text.find("call:", pos)) != std::string::npos) {
+        size_t v = pos + 5;  // step past "call:"
+        if (v < text.size() &&
+            (std::isalpha((unsigned char)text[v]) || text[v] == '_')) {


P2: Pre-check blocks digit-start verbs. Parser accepts them, so valid plain-text calls can be skipped. Allow digits in first verb char.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At server/src/server/sse_emitter.cpp, line 49: <comment>Pre-check blocks digit-start verbs. Parser accepts them, so valid plain-text calls can be skipped. Allow digits in first verb char.</comment> <file context> @@ -23,6 +24,47 @@ static bool has_request_tools(const json & tools) { + while ((pos = text.find("call:", pos)) != std::string::npos) { + size_t v = pos + 5; // step past "call:" + if (v < text.size() && + (std::isalpha((unsigned char)text[v]) || text[v] == '_')) { + size_t w = v; + while (w < text.size() && </file context>

Suggested change

(std::isalpha((unsigned char)text[v]) || text[v] == '_')) {

(std::isalnum((unsigned char)text[v]) || text[v] == '_')) {

cubic-dev-ai · 2026-06-01T13:50:49Z

+            // the pre-strip text in earlier deltas; this only affects
+            // the final accumulated_text() consumed by the response
+            // builders in http_server.cpp.
+            accumulated_content_ = parsed.cleaned_text;


P2: Stripping accumulated_content_ here can desync Responses stream. Deltas already sent raw call text, but done/completed now send stripped text.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At server/src/server/sse_emitter.cpp, line 703: <comment>Stripping accumulated_content_ here can desync Responses stream. Deltas already sent raw call text, but done/completed now send stripped text.</comment> <file context> @@ -608,6 +650,134 @@ std::vector<std::string> SseEmitter::emit_finish(int completion_tokens, + // the pre-strip text in earlier deltas; this only affects + // the final accumulated_text() consumed by the response + // builders in http_server.cpp. + accumulated_content_ = parsed.cleaned_text; + + fr = "tool_calls"; </file context>

# Conflicts: # server/src/server/tool_parser.cpp

easel and others added 5 commits May 31, 2026 22:59

cubic-dev-ai Bot reviewed Jun 1, 2026

View reviewed changes

easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 1, 2026

fix(server): integrate PR Luce-Org#329 tool-call detection

24b9e1e

# Conflicts: # server/src/server/tool_parser.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(server): plain-text call:<verb>{} tool-call detection (parser + emitter wiring)#329

fix(server): plain-text call:<verb>{} tool-call detection (parser + emitter wiring)#329
easel wants to merge 5 commits into
Luce-Org:mainfrom
easel:fix/sse-emitter-content-mode-tool-parse

easel commented Jun 1, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

cubic-dev-ai Bot Jun 1, 2026

Uh oh!

cubic-dev-ai Bot Jun 1, 2026

Uh oh!

cubic-dev-ai Bot Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-                rewritten += c;
+            if (in_str != '"' && c == '"') {
+                rewritten += "\\\"";
+            } else {
+                rewritten += c;
+            }

	(std::isalpha((unsigned char)text[v]) \|\| text[v] == '_')) {
	(std::isalnum((unsigned char)text[v]) \|\| text[v] == '_')) {

Conversation

easel commented Jun 1, 2026

Summary

Empirical signal

History

Out of scope

Files

Tests

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant