Skip to content

fix(server): plain-text call:<verb>{} tool-call detection (parser + emitter wiring)#329

Open
easel wants to merge 5 commits into
Luce-Org:mainfrom
easel:fix/sse-emitter-content-mode-tool-parse
Open

fix(server): plain-text call:<verb>{} tool-call detection (parser + emitter wiring)#329
easel wants to merge 5 commits into
Luce-Org:mainfrom
easel:fix/sse-emitter-content-mode-tool-parse

Conversation

@easel
Copy link
Copy Markdown
Collaborator

@easel easel commented Jun 1, 2026

Summary

Closes the loop on Gemma-4-style plain-text tool emissions reaching /v1/messages clients as proper Anthropic tool_use content blocks. Combines the parser delta from the earlier #323 (which was closed and folded into PR #285) with the previously-missing emitter wiring discovered on 2026-06-01.

Both layers are required for the fix to work end-to-end:

  • Parser (cherry-picked from fix(server): parse gemma's call:<verb>{} plain-text tool emissions #323 + the underscore-prefix follow-up): adds pattern 6 — call:<ns>?<verb>{relaxed-JSON args} — to tool_parser.cpp so plain-text Gemma emissions like call:get_country_info{country: "France"} or _call:... (SentencePiece tokenizer artifact) parse to structured ToolCalls.
  • Emitter wiring: the SSE emitter only entered TOOL_BUFFER mode when the model emitted the literal <tool_call> XML opener. For plain-text emissions it stayed in CONTENT mode and never invoked parse_tool_calls. Added a CONTENT-mode finalize branch that runs the parser on accumulated text when a call: substring is present, hoists matches into tool_calls_, strips the spans from accumulated_text, and flips finish_reason to tool_calls so the existing Anthropic/OpenAI serialization paths emit tool_use content blocks.

Empirical signal

Live smoke test on 2026-05-31 against lucebox-hub:fac7e0f-cuda12 (easel/feat/lucebox-docker tip, which had the parser merged but not yet the emitter wiring):

POST /v1/messages with tools=[get_country_info] →
  stop_reason: "end_turn"
  content[0]: {"type": "text", "text": "_call:get_country_info{country: \"France\"}..."}

Expected (and after this fix):

  stop_reason: "tool_use"
  content[N]: {"type": "tool_use", "name": "get_country_info", "input": {"country": "France"}}

The original PR #323 didn't catch this because its tests called parse_tool_calls directly rather than going through the emitter; the parser was correct but unreachable in the live path.

History

Out of scope

Files

  • Plan: docs/experiments/sse-emitter-content-mode-tool-parse-plan.md (includes codex review verbatim)
  • Plan (parser): docs/experiments/server-call-verb-tool-parser-plan.md (from fix(server): parse gemma's call:<verb>{} plain-text tool emissions #323 cherry-pick)
  • Parser: server/src/server/tool_parser.{cpp,h}
  • Emitter wiring: server/src/server/sse_emitter.cpp
  • Tests: server/test/test_server_unit.cpp (+14 parser, +9 emitter)

Tests

23 new test cases across parser + emitter layers. All passing in a standalone driver build (the sub-agent's CMake build was interrupted by harness timeout before completing; reviewer should confirm via cmake --build server/build --target test_server_unit && server/build/test_server_unit).

🤖 Generated with Claude Code

easel and others added 5 commits May 31, 2026 22:59
Plan + Codex review for running parse_tool_calls on accumulated
CONTENT-mode text so plain-text `call:<verb>{...}` invocations
(Gemma4) actually produce tool_use blocks instead of stop=end_turn.

Codex verdict: REVISE (residue hazard) → integrated as a new emitter-
level test guarding accumulated_text() span strip. Q4 rebuttal:
tool_allowed enforcement is already inside parse_tool_calls.
Captures the diagnosis (gemma forge 0/30 on 2026-05-30), the proposed
sixth detection pattern, the relaxed-JSON arg parser sketch, the
unit-test matrix, and codex's review (which forced reordering the new
pattern to slot Luce-Org#5 ahead of the bare-JSON sweep to avoid interception
of nested name/arguments-shaped args).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a sixth detection pattern to `parse_tool_calls` that recognizes
the plain-text tool invocations gemma emits in chat-completion content
(`call:get_country_info{country: "France"}` /
`call:execute-bead:read-file{path: "..."}` / etc).

The 2026-05-30 gemma full bench scored forge 0/30 because every row's
output carried these `call:<verb>{...}` invocations as text rather
than structured `tool_use` content blocks. None of the existing five
envelope-shaped detectors (`<tool_call>`, `<function=...>`,
`<tool_code>`, bare JSON) match the bare `call:` shape.

The new pattern:
- Anchors on a sentinel character (whitespace, comma, semicolon,
  open/close bracket, etc.) before `call:` so narrative usages like
  `narrative.call:foo` don't match.
- Supports namespaced verbs (`execute-bead:read-file`,
  `default_api:fetch_sales_data`) and strips the namespace before
  using the verb as the ToolCall name.
- Extracts the args block via a quote- and escape-aware balanced-brace
  scanner that tolerates `"`, `'`, and `` ` `` string literals and
  tracks `[]` depth alongside `{}`.
- Parses the args as strict JSON first, then falls back to a relaxed
  rewrite that quotes bare identifier keys and normalizes single/
  backtick quoted strings to double-quoted before retrying. Malformed
  args drop the single invocation without crashing or polluting other
  calls.
- Runs *before* the bare-JSON sweep so that inner args of the form
  `call:outer{"name": "inner", "arguments": {}}` aren't hijacked into
  a spurious `inner` ToolCall by pattern Luce-Org#6.

Downstream the existing wiring takes over: SseEmitter::accumulate
already calls parse_tool_calls; a non-empty ToolCall list flips
finish_reason to `tool_calls`, which the Anthropic /v1/messages
branch maps to `stop_reason="tool_use"` with `tool_use` content
blocks (http_server.cpp:2030-2090) and the OpenAI branch maps to
`choices[].message.tool_calls`.

The forge client-side workaround `_parse_plain_text_tool_calls`
shipping on feat/lucebox-docker (commit deba2fd) becomes redundant
once a server with this fix is deployed. It stays in place as
defense-in-depth for older deployed servers.

Test plan: 14 new C++ unit cases in test_server_unit.cpp covering
single / back-to-back / namespaced / snake- and kebab-case verbs;
tool-allowed filtering; mid-prose rejection vs. whitespace-led
acceptance; malformed args drop; inner `{}` inside string literals;
strict-JSON and relaxed-keys arg parsing; cleaned_text scrubbing;
the codex-requested inner `name`/`arguments` interception case; and
multi-line nested-array args mirroring the snapshot data. All pass
in a standalone driver.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Smoke-testing the post-PR-Luce-Org#323 image (lucebox-hub:cuda12 @ 8039911) on
sindri's gemma-4-26b revealed a new emission pattern: the model
sometimes outputs ``_call:get_country_info{...}`` with a leading
underscore. This is a SentencePiece / chat-template tokenizer artifact
that became visible after bragi's channel-token routing fix
(commit 4b757d1) — the underscore is residual from the tokenizer's
internal serialization that earlier handling stripped.

Both parsers missed these invocations:

* Server-side (tool_parser.cpp:182): the sentinel character class
  ``[\s,;:\(\[\{\}\)\]\>]`` did not include ``_``. Added.
* Client-side (forge.py:32): ``\bcall:`` requires a word boundary
  before ``call``, but ``_`` is a word char so ``\b`` doesn't fire
  between ``_`` and ``c``. Replaced with explicit lookbehind on the
  same sentinel set (including ``_``).

Net result: ``_call:foo{...}`` now parses to a tool_use the same way
``call:foo{...}`` does. Tradeoff: ``my_call:foo{}`` mid-identifier
would also match, but real model outputs don't emit free-form
``my_call:`` text (tool names come from request tool defs).

Tests: +2 cases in test_forge_grader.py (underscore alone, mixed
back-to-back with both prefixed and bare). 16 → 18 forge_grader
tests, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR Luce-Org#323's parser added pattern 6 (call:<verb>{...}) but the SSE emitter
only invokes parse_tool_calls when mode_ == TOOL_BUFFER, which fires
only on the literal <tool_call> XML opener. For models like gemma-4
that emit tool calls as plain text, the emitter stays in CONTENT mode
and the parser is never called, so no tool_use blocks land in the
response (finish_reason="stop" / stop_reason="end_turn").

Add a CONTENT-mode finalize branch that runs parse_tool_calls when
the accumulated text contains a plausible `call:<verb>{` opener
(checked via a cheap O(N) substring scan to avoid regex cost on
no-tool responses). Matches are hoisted into tool_calls_, the
covering spans are stripped from accumulated_text, and finish_reason
flips to "tool_calls" so the existing Anthropic/OpenAI serialization
paths emit proper tool_use content blocks.

Pre-check accepts `_call:foo{` (SentencePiece underscore artifact)
since `find("call:")` lands inside the `_call:` window — full
validation is delegated to parse_tool_calls (tool_parser.cpp).

Tests: +9 unit cases covering parsed/skipped/underscore/no-substring/
multi-call/malformed/no-double-fire-on-XML/empty-tools/preserving
prior content paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 6 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="server/src/server/tool_parser.cpp">

<violation number="1" location="server/src/server/tool_parser.cpp:248">
P2: String rewrite misses escaping inner double quotes. Relaxed JSON like single-quoted text with `"` inside can fail parse and silently lose a tool call. Escape `"` when normalizing non-double-quoted strings.</violation>
</file>

<file name="server/src/server/sse_emitter.cpp">

<violation number="1" location="server/src/server/sse_emitter.cpp:49">
P2: Pre-check blocks digit-start verbs. Parser accepts them, so valid plain-text calls can be skipped. Allow digits in first verb char.</violation>

<violation number="2" location="server/src/server/sse_emitter.cpp:703">
P2: Stripping accumulated_content_ here can desync Responses stream. Deltas already sent raw call text, but done/completed now send stripped text.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

if (in_str) {
// Inside a string we already opened. Mirror escapes verbatim.
if (c == '\\' && i + 1 < payload.size()) {
rewritten += c;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: String rewrite misses escaping inner double quotes. Relaxed JSON like single-quoted text with " inside can fail parse and silently lose a tool call. Escape " when normalizing non-double-quoted strings.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At server/src/server/tool_parser.cpp, line 248:

<comment>String rewrite misses escaping inner double quotes. Relaxed JSON like single-quoted text with `"` inside can fail parse and silently lose a tool call. Escape `"` when normalizing non-double-quoted strings.</comment>

<file context>
@@ -161,6 +168,134 @@ static const std::regex & re_tool_code() {
+        if (in_str) {
+            // Inside a string we already opened. Mirror escapes verbatim.
+            if (c == '\\' && i + 1 < payload.size()) {
+                rewritten += c;
+                rewritten += payload[i + 1];
+                i += 2;
</file context>
Suggested change
rewritten += c;
if (in_str != '"' && c == '"') {
rewritten += "\\\"";
} else {
rewritten += c;
}

while ((pos = text.find("call:", pos)) != std::string::npos) {
size_t v = pos + 5; // step past "call:"
if (v < text.size() &&
(std::isalpha((unsigned char)text[v]) || text[v] == '_')) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Pre-check blocks digit-start verbs. Parser accepts them, so valid plain-text calls can be skipped. Allow digits in first verb char.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At server/src/server/sse_emitter.cpp, line 49:

<comment>Pre-check blocks digit-start verbs. Parser accepts them, so valid plain-text calls can be skipped. Allow digits in first verb char.</comment>

<file context>
@@ -23,6 +24,47 @@ static bool has_request_tools(const json & tools) {
+    while ((pos = text.find("call:", pos)) != std::string::npos) {
+        size_t v = pos + 5;  // step past "call:"
+        if (v < text.size() &&
+            (std::isalpha((unsigned char)text[v]) || text[v] == '_')) {
+            size_t w = v;
+            while (w < text.size() &&
</file context>
Suggested change
(std::isalpha((unsigned char)text[v]) || text[v] == '_')) {
(std::isalnum((unsigned char)text[v]) || text[v] == '_')) {

// the pre-strip text in earlier deltas; this only affects
// the final accumulated_text() consumed by the response
// builders in http_server.cpp.
accumulated_content_ = parsed.cleaned_text;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Stripping accumulated_content_ here can desync Responses stream. Deltas already sent raw call text, but done/completed now send stripped text.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At server/src/server/sse_emitter.cpp, line 703:

<comment>Stripping accumulated_content_ here can desync Responses stream. Deltas already sent raw call text, but done/completed now send stripped text.</comment>

<file context>
@@ -608,6 +650,134 @@ std::vector<std::string> SseEmitter::emit_finish(int completion_tokens,
+            // the pre-strip text in earlier deltas; this only affects
+            // the final accumulated_text() consumed by the response
+            // builders in http_server.cpp.
+            accumulated_content_ = parsed.cleaned_text;
+
+            fr = "tool_calls";
</file context>

easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 1, 2026
# Conflicts:
#	server/src/server/tool_parser.cpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant