fix(server): parse gemma's call:<verb>{} plain-text tool emissions#323
Closed
easel wants to merge 2 commits into
Closed
fix(server): parse gemma's call:<verb>{} plain-text tool emissions#323easel wants to merge 2 commits into
easel wants to merge 2 commits into
Conversation
Captures the diagnosis (gemma forge 0/30 on 2026-05-30), the proposed sixth detection pattern, the relaxed-JSON arg parser sketch, the unit-test matrix, and codex's review (which forced reordering the new pattern to slot Luce-Org#5 ahead of the bare-JSON sweep to avoid interception of nested name/arguments-shaped args). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a sixth detection pattern to `parse_tool_calls` that recognizes
the plain-text tool invocations gemma emits in chat-completion content
(`call:get_country_info{country: "France"}` /
`call:execute-bead:read-file{path: "..."}` / etc).
The 2026-05-30 gemma full bench scored forge 0/30 because every row's
output carried these `call:<verb>{...}` invocations as text rather
than structured `tool_use` content blocks. None of the existing five
envelope-shaped detectors (`<tool_call>`, `<function=...>`,
`<tool_code>`, bare JSON) match the bare `call:` shape.
The new pattern:
- Anchors on a sentinel character (whitespace, comma, semicolon,
open/close bracket, etc.) before `call:` so narrative usages like
`narrative.call:foo` don't match.
- Supports namespaced verbs (`execute-bead:read-file`,
`default_api:fetch_sales_data`) and strips the namespace before
using the verb as the ToolCall name.
- Extracts the args block via a quote- and escape-aware balanced-brace
scanner that tolerates `"`, `'`, and `` ` `` string literals and
tracks `[]` depth alongside `{}`.
- Parses the args as strict JSON first, then falls back to a relaxed
rewrite that quotes bare identifier keys and normalizes single/
backtick quoted strings to double-quoted before retrying. Malformed
args drop the single invocation without crashing or polluting other
calls.
- Runs *before* the bare-JSON sweep so that inner args of the form
`call:outer{"name": "inner", "arguments": {}}` aren't hijacked into
a spurious `inner` ToolCall by pattern Luce-Org#6.
Downstream the existing wiring takes over: SseEmitter::accumulate
already calls parse_tool_calls; a non-empty ToolCall list flips
finish_reason to `tool_calls`, which the Anthropic /v1/messages
branch maps to `stop_reason="tool_use"` with `tool_use` content
blocks (http_server.cpp:2030-2090) and the OpenAI branch maps to
`choices[].message.tool_calls`.
The forge client-side workaround `_parse_plain_text_tool_calls`
shipping on feat/lucebox-docker (commit deba2fd) becomes redundant
once a server with this fix is deployed. It stays in place as
defense-in-depth for older deployed servers.
Test plan: 14 new C++ unit cases in test_server_unit.cpp covering
single / back-to-back / namespaced / snake- and kebab-case verbs;
tool-allowed filtering; mid-prose rejection vs. whitespace-led
acceptance; malformed args drop; inner `{}` inside string literals;
strict-JSON and relaxed-keys arg parsing; cleaned_text scrubbing;
the codex-requested inner `name`/`arguments` interception case; and
multi-line nested-array args mirroring the snapshot data. All pass
in a standalone driver.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
easel
added a commit
to easel/lucebox-hub
that referenced
this pull request
May 31, 2026
easel
added a commit
to easel/lucebox-hub
that referenced
this pull request
May 31, 2026
Documents the server-side call:<verb>{} tool parser fix (PR Luce-Org#323) and
the C++17 compatibility fix for starts_with. Benchmarks running.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
easel
added a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 1, 2026
Smoke-testing the post-PR-Luce-Org#323 image (lucebox-hub:cuda12 @ 8039911) on sindri's gemma-4-26b revealed a new emission pattern: the model sometimes outputs ``_call:get_country_info{...}`` with a leading underscore. This is a SentencePiece / chat-template tokenizer artifact that became visible after bragi's channel-token routing fix (commit 4b757d1) — the underscore is residual from the tokenizer's internal serialization that earlier handling stripped. Both parsers missed these invocations: * Server-side (tool_parser.cpp:182): the sentinel character class ``[\s,;:\(\[\{\}\)\]\>]`` did not include ``_``. Added. * Client-side (forge.py:32): ``\bcall:`` requires a word boundary before ``call``, but ``_`` is a word char so ``\b`` doesn't fire between ``_`` and ``c``. Replaced with explicit lookbehind on the same sentinel set (including ``_``). Net result: ``_call:foo{...}`` now parses to a tool_use the same way ``call:foo{...}`` does. Tradeoff: ``my_call:foo{}`` mid-identifier would also match, but real model outputs don't emit free-form ``my_call:`` text (tool names come from request tool defs). Tests: +2 cases in test_forge_grader.py (underscore alone, mixed back-to-back with both prefixed and bare). 16 → 18 forge_grader tests, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a sixth detection pattern to
server/src/server/tool_parser.cpp::parse_tool_callsthat recognizes the plain-text tool invocations gemma emits in chat-completion content — e.g.call:get_country_info{country: "France"},call:execute-bead:read-file{path: "..."},call:default_api:analyze_data{data: [...]}. None of the existing five envelope-shaped patterns (<tool_call>, bare<function=...>,<function=...(args)>,<tool_code>, bare JSON) match this shape, so a non-trivial slice of gemma's tool-using runs surfaces text-only to clients that expect structuredtool_useblocks.Why this matters
The 2026-05-30 gemma full bench (
d9ecba6cc105-nvidia-geforce-rtx-3090-ti-gemma-full-2026-05-30-67f4) scored forge 0/30 because every row's iterations[0].output carried thesecall:<verb>{...}invocations as text content rather than structured Anthropictool_useblocks. forge's WorkflowRunner expects aToolCallpayload and the row dies withValidationError.The downstream wiring is already in place:
SseEmitter::accumulatecallsparse_tool_calls, a non-empty result flipsfinish_reasontotool_calls, the Anthropic/v1/messagesbranch maps that tostop_reason="tool_use"withtool_usecontent blocks (server/src/server/http_server.cpp:2030-2090), and the OpenAI branch maps tochoices[].message.tool_calls. The only gap was the parser.What's in the new pattern
(^|[\s,;:\(\[\{\}\)\]\>])call:([A-Za-z0-9_.:\-]+)\s*\{— narrativeI'll call:foodoes match (whitespace beforecall:) butnarrative.call:foodoesn't (no sentinel).}is in the sentinel set so back-to-back invocationscall:a{...}call:b{...}both fire.{}/[]depth, skips over"..."/'...'/`...`string literals, honours\escapes. Handles the multi-line nested-array shape from the snapshot.json::parsefirst; on failure, rewrites the buffer to quote bare identifier keys and normalize single/backtick quoted strings to double-quoted, then retries. Malformed args drop only that one invocation — no crash, no false positives, surrounding calls keep working.call:execute-bead:read-file→ ToolCall nameread-file(strips everything up to and including the last:).call:outer{"name": "inner", "arguments": {}}where the bare-JSON sweep would otherwise pluck out the inner{"name": ...}as a spuriousinnerToolCall. The brace span pattern I can’t reproduce the prefill performance on the RTX 3090 #5 records inremovalsshadows the inner JSON from pattern feat: add DFlash for Windows #6's view via the existingoverlaps()check.tool_allowed(tools, verb)path so callers that pass a constrained tool list (forge) only get back tools they declared.The full implementation plan, including the codex review and adjustments, lives at
docs/experiments/server-call-verb-tool-parser-plan.md. Codex flagged the pattern-ordering hazard during review and the plan was revised before any code was written.Test plan
14 new C++ unit cases in
server/test/test_server_unit.cppcovering:execute-bead:read-file→read-file)narrative.call:foo{...}rejected (no sentinel char)Sure, I'll call:foo{...}accepted (whitespace sentinel){...}and}inside string values doesn't break the scannercleaned_textis properly scrubbed of the matched spancall:outer{"name": "inner", "arguments": {}}produces exactly one ToolCall namedouter, not twodefault_api:analyze_datafrom the snapshotAll 14 pass in a standalone driver. CI will run them via the existing
test_server_unittarget wired inserver/CMakeLists.txt.Reverse-compat with the client-side fix
The forge client-side workaround
_parse_plain_text_tool_callsshipping onfeat/lucebox-docker(commitdeba2fd) becomes redundant once a server with this fix is deployed. It stays in place as defense-in-depth for older deployed servers — the synthesizedToolCalls match what the server now produces, so the only cost is a few extra cycles in the client. Don't strip the Python fallback in this PR.Out of scope
🤖 Generated with Claude Code