Skip to content

fix(examples): update multi-turn examples to current renderer API#68

Merged
hallerite merged 1 commit into
mainfrom
worktree-examples-current-api
May 27, 2026
Merged

fix(examples): update multi-turn examples to current renderer API#68
hallerite merged 1 commit into
mainfrom
worktree-examples-current-api

Conversation

@hallerite
Copy link
Copy Markdown
Member

@hallerite hallerite commented May 27, 2026

Why

The five multi-turn examples were written against renderers <= 0.1.x and no longer run against current main — they fail before reaching any inference engine.

Three API drifts (fixed in all five scripts)

Was (old API) Now (current API)
Qwen35Renderer(tokenizer, enable_thinking=...) Qwen35Renderer(tokenizer, Qwen35RendererConfig(enable_thinking=...)) — the kwarg was removed by the typed-config refactor (#60)
tool_call.get("function") / .get("arguments") parse_response().tool_calls are ParsedToolCall dataclasses → attribute access (tc.name / tc.arguments / tc.id)
bridged_ids = bridge_to_next_turn(...) then sliced as a list bridge_to_next_turn() returns RenderedTokens → read .token_ids

Also replaced json.dumps(parsed.tool_calls) in print_parsed (throws on dataclasses) with a readable per-call line.

Validation (actually run on GPU, current renderer code)

Example Qwen3.5-4B (think on/off) gpt-oss-20b
transformers
vllm
sglang (offline) ⚠️ host-blocked (see below)

Each ✅ is the full loop: render → generate → multiply tool-call parsed [ok]bridge_to_next_turn → tool result → final answer "391".

Notes:

  • vllm on this Blackwell box needed env flags TORCH_CUDA_ARCH_LIST=12.0 and VLLM_USE_FLASHINFER_SAMPLER=0 (environment config, not example changes).
  • sglang requires the harmony floor relaxation in fix(deps): lower openai-harmony floor to >=0.0.4 for SGLang compatibility #69 (merged) to install alongside renderers — every sglang through 0.5.12.post1 hard-pins openai-harmony==0.0.4. The Qwen path is validated end-to-end. The gpt-oss-via-sglang target fails only because this host's CUDA toolkit is 11.5 (no nvcc new enough to JIT-compile the sm_120/c++20 CUDA-graph kernels); gpt-oss itself is validated via transformers + vllm.
  • tinker not run (needs the hosted Tinker API); its renderer-side code is identical to the others.

🤖 Generated with Claude Code

The examples were written against renderers <=0.1.x and no longer run
against current main. Three API drifts, fixed across all five scripts:

- Constructor: `Qwen35Renderer(tokenizer, enable_thinking=...)` — the
  `enable_thinking` kwarg was removed by the typed-config refactor (#60).
  Pass `Qwen35RendererConfig(enable_thinking=...)` instead.
- Tool calls: `parse_response().tool_calls` are `ParsedToolCall`
  dataclasses, not dicts. Use attribute access (`tc.name` / `tc.arguments`
  / `tc.id`) instead of `tool_call.get(...)`, and build OpenAI-format
  tool_calls explicitly when echoing the assistant turn.
- Bridge: `bridge_to_next_turn()` returns `RenderedTokens` (not
  `list[int]`); read the extended id stream from `.token_ids`.

Also replaced the `json.dumps(parsed.tool_calls)` print (which now fails
on dataclasses) with a readable per-call line.

Validated the transformers example end-to-end on GPU (Qwen3.5-4B, both
thinking modes): render -> generate -> tool-call parse -> bridge -> tool
result -> final answer "17 x 23 = 391". The other four scripts share
identical renderer-side logic (only the engine transport differs) and
pass compile + ruff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@macroscopeapp
Copy link
Copy Markdown

macroscopeapp Bot commented May 27, 2026

Approvability

Verdict: Approved

All changes are confined to example files in the examples/ directory, mechanically updating them to match current API signatures. No production library code is modified.

You can customize Macroscope's approvability policy. Learn more.

@hallerite hallerite merged commit 74425da into main May 27, 2026
11 checks passed
@hallerite hallerite deleted the worktree-examples-current-api branch May 27, 2026 17:07
hallerite added a commit that referenced this pull request May 27, 2026
…e from Tokenizer

Brings in #68 (examples), #69 (harmony floor), #71 (qwen3.5 hard-coded
enable_thinking). The only qwen35.py conflict is resolved by keeping #71's
hard-coded `_ENABLE_THINKING_DEFAULTS` table (no `apply_chat_template`
probe) on top of #31's `Tokenizer`/`Processor` type hints.

Now that #71 removed the last hand-coded-renderer call to
`apply_chat_template`, drop it from the `Tokenizer` protocol so a plain
`tokenizers.Tokenizer` wrapper satisfies it. `apply_chat_template` moves to
a new `ChatTemplateTokenizer(Tokenizer, Protocol)` subtype, required only by
`DefaultRenderer` (the generic chat-template fallback).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant