fix(examples): update multi-turn examples to current renderer API#68
Merged
Conversation
The examples were written against renderers <=0.1.x and no longer run against current main. Three API drifts, fixed across all five scripts: - Constructor: `Qwen35Renderer(tokenizer, enable_thinking=...)` — the `enable_thinking` kwarg was removed by the typed-config refactor (#60). Pass `Qwen35RendererConfig(enable_thinking=...)` instead. - Tool calls: `parse_response().tool_calls` are `ParsedToolCall` dataclasses, not dicts. Use attribute access (`tc.name` / `tc.arguments` / `tc.id`) instead of `tool_call.get(...)`, and build OpenAI-format tool_calls explicitly when echoing the assistant turn. - Bridge: `bridge_to_next_turn()` returns `RenderedTokens` (not `list[int]`); read the extended id stream from `.token_ids`. Also replaced the `json.dumps(parsed.tool_calls)` print (which now fails on dataclasses) with a readable per-call line. Validated the transformers example end-to-end on GPU (Qwen3.5-4B, both thinking modes): render -> generate -> tool-call parse -> bridge -> tool result -> final answer "17 x 23 = 391". The other four scripts share identical renderer-side logic (only the engine transport differs) and pass compile + ruff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ApprovabilityVerdict: Approved All changes are confined to example files in the You can customize Macroscope's approvability policy. Learn more. |
hallerite
added a commit
that referenced
this pull request
May 27, 2026
…e from Tokenizer Brings in #68 (examples), #69 (harmony floor), #71 (qwen3.5 hard-coded enable_thinking). The only qwen35.py conflict is resolved by keeping #71's hard-coded `_ENABLE_THINKING_DEFAULTS` table (no `apply_chat_template` probe) on top of #31's `Tokenizer`/`Processor` type hints. Now that #71 removed the last hand-coded-renderer call to `apply_chat_template`, drop it from the `Tokenizer` protocol so a plain `tokenizers.Tokenizer` wrapper satisfies it. `apply_chat_template` moves to a new `ChatTemplateTokenizer(Tokenizer, Protocol)` subtype, required only by `DefaultRenderer` (the generic chat-template fallback).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The five multi-turn examples were written against
renderers <= 0.1.xand no longer run against currentmain— they fail before reaching any inference engine.Three API drifts (fixed in all five scripts)
Qwen35Renderer(tokenizer, enable_thinking=...)Qwen35Renderer(tokenizer, Qwen35RendererConfig(enable_thinking=...))— the kwarg was removed by the typed-config refactor (#60)tool_call.get("function")/.get("arguments")parse_response().tool_callsareParsedToolCalldataclasses → attribute access (tc.name/tc.arguments/tc.id)bridged_ids = bridge_to_next_turn(...)then sliced as a listbridge_to_next_turn()returnsRenderedTokens→ read.token_idsAlso replaced
json.dumps(parsed.tool_calls)inprint_parsed(throws on dataclasses) with a readable per-call line.Validation (actually run on GPU, current renderer code)
Each ✅ is the full loop: render → generate →
multiplytool-call parsed[ok]→bridge_to_next_turn→ tool result → final answer "391".Notes:
TORCH_CUDA_ARCH_LIST=12.0andVLLM_USE_FLASHINFER_SAMPLER=0(environment config, not example changes).openai-harmony==0.0.4. The Qwen path is validated end-to-end. The gpt-oss-via-sglang target fails only because this host's CUDA toolkit is 11.5 (nonvccnew enough to JIT-compile thesm_120/c++20CUDA-graph kernels); gpt-oss itself is validated via transformers + vllm.🤖 Generated with Claude Code