fix(server): plain-text call:<verb>{} tool-call detection (parser + emitter wiring)#329
Open
easel wants to merge 5 commits into
Open
fix(server): plain-text call:<verb>{} tool-call detection (parser + emitter wiring)#329easel wants to merge 5 commits into
easel wants to merge 5 commits into
Conversation
Plan + Codex review for running parse_tool_calls on accumulated
CONTENT-mode text so plain-text `call:<verb>{...}` invocations
(Gemma4) actually produce tool_use blocks instead of stop=end_turn.
Codex verdict: REVISE (residue hazard) → integrated as a new emitter-
level test guarding accumulated_text() span strip. Q4 rebuttal:
tool_allowed enforcement is already inside parse_tool_calls.
Captures the diagnosis (gemma forge 0/30 on 2026-05-30), the proposed sixth detection pattern, the relaxed-JSON arg parser sketch, the unit-test matrix, and codex's review (which forced reordering the new pattern to slot Luce-Org#5 ahead of the bare-JSON sweep to avoid interception of nested name/arguments-shaped args). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a sixth detection pattern to `parse_tool_calls` that recognizes
the plain-text tool invocations gemma emits in chat-completion content
(`call:get_country_info{country: "France"}` /
`call:execute-bead:read-file{path: "..."}` / etc).
The 2026-05-30 gemma full bench scored forge 0/30 because every row's
output carried these `call:<verb>{...}` invocations as text rather
than structured `tool_use` content blocks. None of the existing five
envelope-shaped detectors (`<tool_call>`, `<function=...>`,
`<tool_code>`, bare JSON) match the bare `call:` shape.
The new pattern:
- Anchors on a sentinel character (whitespace, comma, semicolon,
open/close bracket, etc.) before `call:` so narrative usages like
`narrative.call:foo` don't match.
- Supports namespaced verbs (`execute-bead:read-file`,
`default_api:fetch_sales_data`) and strips the namespace before
using the verb as the ToolCall name.
- Extracts the args block via a quote- and escape-aware balanced-brace
scanner that tolerates `"`, `'`, and `` ` `` string literals and
tracks `[]` depth alongside `{}`.
- Parses the args as strict JSON first, then falls back to a relaxed
rewrite that quotes bare identifier keys and normalizes single/
backtick quoted strings to double-quoted before retrying. Malformed
args drop the single invocation without crashing or polluting other
calls.
- Runs *before* the bare-JSON sweep so that inner args of the form
`call:outer{"name": "inner", "arguments": {}}` aren't hijacked into
a spurious `inner` ToolCall by pattern Luce-Org#6.
Downstream the existing wiring takes over: SseEmitter::accumulate
already calls parse_tool_calls; a non-empty ToolCall list flips
finish_reason to `tool_calls`, which the Anthropic /v1/messages
branch maps to `stop_reason="tool_use"` with `tool_use` content
blocks (http_server.cpp:2030-2090) and the OpenAI branch maps to
`choices[].message.tool_calls`.
The forge client-side workaround `_parse_plain_text_tool_calls`
shipping on feat/lucebox-docker (commit deba2fd) becomes redundant
once a server with this fix is deployed. It stays in place as
defense-in-depth for older deployed servers.
Test plan: 14 new C++ unit cases in test_server_unit.cpp covering
single / back-to-back / namespaced / snake- and kebab-case verbs;
tool-allowed filtering; mid-prose rejection vs. whitespace-led
acceptance; malformed args drop; inner `{}` inside string literals;
strict-JSON and relaxed-keys arg parsing; cleaned_text scrubbing;
the codex-requested inner `name`/`arguments` interception case; and
multi-line nested-array args mirroring the snapshot data. All pass
in a standalone driver.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Smoke-testing the post-PR-Luce-Org#323 image (lucebox-hub:cuda12 @ 8039911) on sindri's gemma-4-26b revealed a new emission pattern: the model sometimes outputs ``_call:get_country_info{...}`` with a leading underscore. This is a SentencePiece / chat-template tokenizer artifact that became visible after bragi's channel-token routing fix (commit 4b757d1) — the underscore is residual from the tokenizer's internal serialization that earlier handling stripped. Both parsers missed these invocations: * Server-side (tool_parser.cpp:182): the sentinel character class ``[\s,;:\(\[\{\}\)\]\>]`` did not include ``_``. Added. * Client-side (forge.py:32): ``\bcall:`` requires a word boundary before ``call``, but ``_`` is a word char so ``\b`` doesn't fire between ``_`` and ``c``. Replaced with explicit lookbehind on the same sentinel set (including ``_``). Net result: ``_call:foo{...}`` now parses to a tool_use the same way ``call:foo{...}`` does. Tradeoff: ``my_call:foo{}`` mid-identifier would also match, but real model outputs don't emit free-form ``my_call:`` text (tool names come from request tool defs). Tests: +2 cases in test_forge_grader.py (underscore alone, mixed back-to-back with both prefixed and bare). 16 → 18 forge_grader tests, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR Luce-Org#323's parser added pattern 6 (call:<verb>{...}) but the SSE emitter only invokes parse_tool_calls when mode_ == TOOL_BUFFER, which fires only on the literal <tool_call> XML opener. For models like gemma-4 that emit tool calls as plain text, the emitter stays in CONTENT mode and the parser is never called, so no tool_use blocks land in the response (finish_reason="stop" / stop_reason="end_turn"). Add a CONTENT-mode finalize branch that runs parse_tool_calls when the accumulated text contains a plausible `call:<verb>{` opener (checked via a cheap O(N) substring scan to avoid regex cost on no-tool responses). Matches are hoisted into tool_calls_, the covering spans are stripped from accumulated_text, and finish_reason flips to "tool_calls" so the existing Anthropic/OpenAI serialization paths emit proper tool_use content blocks. Pre-check accepts `_call:foo{` (SentencePiece underscore artifact) since `find("call:")` lands inside the `_call:` window — full validation is delegated to parse_tool_calls (tool_parser.cpp). Tests: +9 unit cases covering parsed/skipped/underscore/no-substring/ multi-call/malformed/no-double-fire-on-XML/empty-tools/preserving prior content paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
3 issues found across 6 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="server/src/server/tool_parser.cpp">
<violation number="1" location="server/src/server/tool_parser.cpp:248">
P2: String rewrite misses escaping inner double quotes. Relaxed JSON like single-quoted text with `"` inside can fail parse and silently lose a tool call. Escape `"` when normalizing non-double-quoted strings.</violation>
</file>
<file name="server/src/server/sse_emitter.cpp">
<violation number="1" location="server/src/server/sse_emitter.cpp:49">
P2: Pre-check blocks digit-start verbs. Parser accepts them, so valid plain-text calls can be skipped. Allow digits in first verb char.</violation>
<violation number="2" location="server/src/server/sse_emitter.cpp:703">
P2: Stripping accumulated_content_ here can desync Responses stream. Deltas already sent raw call text, but done/completed now send stripped text.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| if (in_str) { | ||
| // Inside a string we already opened. Mirror escapes verbatim. | ||
| if (c == '\\' && i + 1 < payload.size()) { | ||
| rewritten += c; |
Contributor
There was a problem hiding this comment.
P2: String rewrite misses escaping inner double quotes. Relaxed JSON like single-quoted text with " inside can fail parse and silently lose a tool call. Escape " when normalizing non-double-quoted strings.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At server/src/server/tool_parser.cpp, line 248:
<comment>String rewrite misses escaping inner double quotes. Relaxed JSON like single-quoted text with `"` inside can fail parse and silently lose a tool call. Escape `"` when normalizing non-double-quoted strings.</comment>
<file context>
@@ -161,6 +168,134 @@ static const std::regex & re_tool_code() {
+ if (in_str) {
+ // Inside a string we already opened. Mirror escapes verbatim.
+ if (c == '\\' && i + 1 < payload.size()) {
+ rewritten += c;
+ rewritten += payload[i + 1];
+ i += 2;
</file context>
Suggested change
| rewritten += c; | |
| if (in_str != '"' && c == '"') { | |
| rewritten += "\\\""; | |
| } else { | |
| rewritten += c; | |
| } |
| while ((pos = text.find("call:", pos)) != std::string::npos) { | ||
| size_t v = pos + 5; // step past "call:" | ||
| if (v < text.size() && | ||
| (std::isalpha((unsigned char)text[v]) || text[v] == '_')) { |
Contributor
There was a problem hiding this comment.
P2: Pre-check blocks digit-start verbs. Parser accepts them, so valid plain-text calls can be skipped. Allow digits in first verb char.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At server/src/server/sse_emitter.cpp, line 49:
<comment>Pre-check blocks digit-start verbs. Parser accepts them, so valid plain-text calls can be skipped. Allow digits in first verb char.</comment>
<file context>
@@ -23,6 +24,47 @@ static bool has_request_tools(const json & tools) {
+ while ((pos = text.find("call:", pos)) != std::string::npos) {
+ size_t v = pos + 5; // step past "call:"
+ if (v < text.size() &&
+ (std::isalpha((unsigned char)text[v]) || text[v] == '_')) {
+ size_t w = v;
+ while (w < text.size() &&
</file context>
Suggested change
| (std::isalpha((unsigned char)text[v]) || text[v] == '_')) { | |
| (std::isalnum((unsigned char)text[v]) || text[v] == '_')) { |
| // the pre-strip text in earlier deltas; this only affects | ||
| // the final accumulated_text() consumed by the response | ||
| // builders in http_server.cpp. | ||
| accumulated_content_ = parsed.cleaned_text; |
Contributor
There was a problem hiding this comment.
P2: Stripping accumulated_content_ here can desync Responses stream. Deltas already sent raw call text, but done/completed now send stripped text.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At server/src/server/sse_emitter.cpp, line 703:
<comment>Stripping accumulated_content_ here can desync Responses stream. Deltas already sent raw call text, but done/completed now send stripped text.</comment>
<file context>
@@ -608,6 +650,134 @@ std::vector<std::string> SseEmitter::emit_finish(int completion_tokens,
+ // the pre-strip text in earlier deltas; this only affects
+ // the final accumulated_text() consumed by the response
+ // builders in http_server.cpp.
+ accumulated_content_ = parsed.cleaned_text;
+
+ fr = "tool_calls";
</file context>
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 1, 2026
# Conflicts: # server/src/server/tool_parser.cpp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the loop on Gemma-4-style plain-text tool emissions reaching
/v1/messagesclients as proper Anthropictool_usecontent blocks. Combines the parser delta from the earlier #323 (which was closed and folded into PR #285) with the previously-missing emitter wiring discovered on 2026-06-01.Both layers are required for the fix to work end-to-end:
call:<ns>?<verb>{relaxed-JSON args}— totool_parser.cppso plain-text Gemma emissions likecall:get_country_info{country: "France"}or_call:...(SentencePiece tokenizer artifact) parse to structured ToolCalls.TOOL_BUFFERmode when the model emitted the literal<tool_call>XML opener. For plain-text emissions it stayed inCONTENTmode and never invokedparse_tool_calls. Added aCONTENT-mode finalize branch that runs the parser on accumulated text when acall:substring is present, hoists matches intotool_calls_, strips the spans fromaccumulated_text, and flipsfinish_reasontotool_callsso the existing Anthropic/OpenAI serialization paths emittool_usecontent blocks.Empirical signal
Live smoke test on 2026-05-31 against
lucebox-hub:fac7e0f-cuda12(easel/feat/lucebox-dockertip, which had the parser merged but not yet the emitter wiring):Expected (and after this fix):
The original PR #323 didn't catch this because its tests called
parse_tool_callsdirectly rather than going through the emitter; the parser was correct but unreachable in the live path.History
feat/lucebox-docker.maintogether — they're inseparable for functional correctness, and the emitter wiring on its own has no parser to invoke againstLuce-Org/main.feat/lucebox-docker) carries the equivalent changes via separate commits; whichever lands first wins and the other auto-resolves.Out of scope
Files
docs/experiments/sse-emitter-content-mode-tool-parse-plan.md(includes codex review verbatim)docs/experiments/server-call-verb-tool-parser-plan.md(from fix(server): parse gemma's call:<verb>{} plain-text tool emissions #323 cherry-pick)server/src/server/tool_parser.{cpp,h}server/src/server/sse_emitter.cppserver/test/test_server_unit.cpp(+14 parser, +9 emitter)Tests
23 new test cases across parser + emitter layers. All passing in a standalone driver build (the sub-agent's CMake build was interrupted by harness timeout before completing; reviewer should confirm via
cmake --build server/build --target test_server_unit && server/build/test_server_unit).🤖 Generated with Claude Code