fix(server): parse gemma's call:<verb>{} plain-text tool emissions by easel · Pull Request #323 · Luce-Org/lucebox-hub

easel · 2026-05-31T15:38:57Z

Summary

Adds a sixth detection pattern to server/src/server/tool_parser.cpp::parse_tool_calls that recognizes the plain-text tool invocations gemma emits in chat-completion content — e.g. call:get_country_info{country: "France"}, call:execute-bead:read-file{path: "..."}, call:default_api:analyze_data{data: [...]}. None of the existing five envelope-shaped patterns (<tool_call>, bare <function=...>, <function=...(args)>, <tool_code>, bare JSON) match this shape, so a non-trivial slice of gemma's tool-using runs surfaces text-only to clients that expect structured tool_use blocks.

Why this matters

The 2026-05-30 gemma full bench (d9ecba6cc105-nvidia-geforce-rtx-3090-ti-gemma-full-2026-05-30-67f4) scored forge 0/30 because every row's iterations[0].output carried these call:<verb>{...} invocations as text content rather than structured Anthropic tool_use blocks. forge's WorkflowRunner expects a ToolCall payload and the row dies with ValidationError.

The downstream wiring is already in place: SseEmitter::accumulate calls parse_tool_calls, a non-empty result flips finish_reason to tool_calls, the Anthropic /v1/messages branch maps that to stop_reason="tool_use" with tool_use content blocks (server/src/server/http_server.cpp:2030-2090), and the OpenAI branch maps to choices[].message.tool_calls. The only gap was the parser.

What's in the new pattern

Regex with sentinel anchor. (^|[\s,;:\(\[\{\}\)\]\>])call:([A-Za-z0-9_.:\-]+)\s*\{ — narrative I'll call:foo does match (whitespace before call:) but narrative.call:foo doesn't (no sentinel). } is in the sentinel set so back-to-back invocations call:a{...}call:b{...} both fire.
Quote- and escape-aware brace scanner. Tracks {}/[] depth, skips over "..." / '...' / `...` string literals, honours \ escapes. Handles the multi-line nested-array shape from the snapshot.
Relaxed-JSON arg parser. Strict json::parse first; on failure, rewrites the buffer to quote bare identifier keys and normalize single/backtick quoted strings to double-quoted, then retries. Malformed args drop only that one invocation — no crash, no false positives, surrounding calls keep working.
Verb normalization. call:execute-bead:read-file → ToolCall name read-file (strips everything up to and including the last :).
Pattern ordering. Inserted as pattern I can’t reproduce the prefill performance on the RTX 3090 #5, ahead of the bare-JSON sweep (now feat: add DFlash for Windows #6). This defends against call:outer{"name": "inner", "arguments": {}} where the bare-JSON sweep would otherwise pluck out the inner {"name": ...} as a spurious inner ToolCall. The brace span pattern I can’t reproduce the prefill performance on the RTX 3090 #5 records in removals shadows the inner JSON from pattern feat: add DFlash for Windows #6's view via the existing overlaps() check.
Tool-allowed filter. Re-uses the existing tool_allowed(tools, verb) path so callers that pass a constrained tool list (forge) only get back tools they declared.

The full implementation plan, including the codex review and adjustments, lives at docs/experiments/server-call-verb-tool-parser-plan.md. Codex flagged the pattern-ordering hazard during review and the plan was revised before any code was written.

Test plan

14 new C++ unit cases in server/test/test_server_unit.cpp covering:

All 14 pass in a standalone driver. CI will run them via the existing test_server_unit target wired in server/CMakeLists.txt.

Reverse-compat with the client-side fix

The forge client-side workaround _parse_plain_text_tool_calls shipping on feat/lucebox-docker (commit deba2fd) becomes redundant once a server with this fix is deployed. It stays in place as defense-in-depth for older deployed servers — the synthesized ToolCalls match what the server now produces, so the only cost is a few extra cycles in the client. Don't strip the Python fallback in this PR.

Out of scope

Docker image rebuild + e2e bench re-run. The operator should rebuild the cuda12 image and confirm forge passes non-trivially without the client-side workaround once this merges.

🤖 Generated with Claude Code

Captures the diagnosis (gemma forge 0/30 on 2026-05-30), the proposed sixth detection pattern, the relaxed-JSON arg parser sketch, the unit-test matrix, and codex's review (which forced reordering the new pattern to slot Luce-Org#5 ahead of the bare-JSON sweep to avoid interception of nested name/arguments-shaped args). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a sixth detection pattern to `parse_tool_calls` that recognizes the plain-text tool invocations gemma emits in chat-completion content (`call:get_country_info{country: "France"}` / `call:execute-bead:read-file{path: "..."}` / etc). The 2026-05-30 gemma full bench scored forge 0/30 because every row's output carried these `call:<verb>{...}` invocations as text rather than structured `tool_use` content blocks. None of the existing five envelope-shaped detectors (`<tool_call>`, `<function=...>`, `<tool_code>`, bare JSON) match the bare `call:` shape. The new pattern: - Anchors on a sentinel character (whitespace, comma, semicolon, open/close bracket, etc.) before `call:` so narrative usages like `narrative.call:foo` don't match. - Supports namespaced verbs (`execute-bead:read-file`, `default_api:fetch_sales_data`) and strips the namespace before using the verb as the ToolCall name. - Extracts the args block via a quote- and escape-aware balanced-brace scanner that tolerates `"`, `'`, and `` ` `` string literals and tracks `[]` depth alongside `{}`. - Parses the args as strict JSON first, then falls back to a relaxed rewrite that quotes bare identifier keys and normalizes single/ backtick quoted strings to double-quoted before retrying. Malformed args drop the single invocation without crashing or polluting other calls. - Runs *before* the bare-JSON sweep so that inner args of the form `call:outer{"name": "inner", "arguments": {}}` aren't hijacked into a spurious `inner` ToolCall by pattern Luce-Org#6. Downstream the existing wiring takes over: SseEmitter::accumulate already calls parse_tool_calls; a non-empty ToolCall list flips finish_reason to `tool_calls`, which the Anthropic /v1/messages branch maps to `stop_reason="tool_use"` with `tool_use` content blocks (http_server.cpp:2030-2090) and the OpenAI branch maps to `choices[].message.tool_calls`. The forge client-side workaround `_parse_plain_text_tool_calls` shipping on feat/lucebox-docker (commit deba2fd) becomes redundant once a server with this fix is deployed. It stays in place as defense-in-depth for older deployed servers. Test plan: 14 new C++ unit cases in test_server_unit.cpp covering single / back-to-back / namespaced / snake- and kebab-case verbs; tool-allowed filtering; mid-prose rejection vs. whitespace-led acceptance; malformed args drop; inner `{}` inside string literals; strict-JSON and relaxed-keys arg parsing; cleaned_text scrubbing; the codex-requested inner `name`/`arguments` interception case; and multi-line nested-array args mirroring the snapshot data. All pass in a standalone driver. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

easel · 2026-05-31T15:43:20Z

Folded into PR #285 — commits 12c50c0 (plan + codex review) and cdb8b9c (implementation) are now part of feat/lucebox-docker (tip 329f611). The C++ fix lands via #285 instead of as a separate PR against main.

Documents the server-side call:<verb>{} tool parser fix (PR Luce-Org#323) and the C++17 compatibility fix for starts_with. Benchmarks running. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Smoke-testing the post-PR-Luce-Org#323 image (lucebox-hub:cuda12 @ 8039911) on sindri's gemma-4-26b revealed a new emission pattern: the model sometimes outputs ``_call:get_country_info{...}`` with a leading underscore. This is a SentencePiece / chat-template tokenizer artifact that became visible after bragi's channel-token routing fix (commit 4b757d1) — the underscore is residual from the tokenizer's internal serialization that earlier handling stripped. Both parsers missed these invocations: * Server-side (tool_parser.cpp:182): the sentinel character class ``[\s,;:\(\[\{\}\)\]\>]`` did not include ``_``. Added. * Client-side (forge.py:32): ``\bcall:`` requires a word boundary before ``call``, but ``_`` is a word char so ``\b`` doesn't fire between ``_`` and ``c``. Replaced with explicit lookbehind on the same sentinel set (including ``_``). Net result: ``_call:foo{...}`` now parses to a tool_use the same way ``call:foo{...}`` does. Tradeoff: ``my_call:foo{}`` mid-identifier would also match, but real model outputs don't emit free-form ``my_call:`` text (tool names come from request tool defs). Tests: +2 cases in test_forge_grader.py (underscore alone, mixed back-to-back with both prefixed and bare). 16 → 18 forge_grader tests, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

easel and others added 2 commits May 31, 2026 11:34

easel added a commit to easel/lucebox-hub that referenced this pull request May 31, 2026

Merge PR Luce-Org#323: server-side call:<verb>{} tool parser

5ca695c

easel closed this May 31, 2026

easel mentioned this pull request Jun 1, 2026

fix(server): plain-text call:<verb>{} tool-call detection (parser + emitter wiring) #329

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(server): parse gemma's call:<verb>{} plain-text tool emissions#323

fix(server): parse gemma's call:<verb>{} plain-text tool emissions#323
easel wants to merge 2 commits into
Luce-Org:mainfrom
easel:fix/server-call-verb-tool-parser

easel commented May 31, 2026

Uh oh!

easel commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

easel commented May 31, 2026

Summary

Why this matters

What's in the new pattern

Test plan

Reverse-compat with the client-side fix

Out of scope

Uh oh!

easel commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant