Typed renderer configs#60
Conversation
ApprovabilityVerdict: Needs human review Major refactor replacing the renderer construction API with typed pydantic configs. Introduces breaking changes to You can customize Macroscope's approvability policy. Learn more. |
hallerite
left a comment
There was a problem hiding this comment.
We should also implement the different chat template options in the renderers
Auto-derives a parity matrix from each renderer's CHAT_TEMPLATE_KWARGS frozenset crossed with per-kwarg values, asserting render_ids == apply_chat_template (or openai-harmony for gpt-oss) so the kwarg surface stays a promise the renderer keeps. Surfaced three renderers whose exposed kwarg didn't actually round-trip through the upstream template: - DeepSeek-V3's chat template has no thinking variable; cleared CHAT_TEMPLATE_KWARGS so users can't pass a kwarg the template silently drops. Constructor kwarg stays for the R1-distill prefill. - GLM-5.1's empty_think_on_last_assistant wrap was unconditional; template emits a lone </think> when enable_thinking=False, so the branch is now gated on _enable_thinking. - Kimi K2.5 / K2.6's template uses ``thinking`` (not ``enable_thinking``); renamed the constructor kwarg and frozenset entry to match so chat_template_kwargs flows straight through. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audited each renderer's CHAT_TEMPLATE_KWARGS frozenset against the variables its upstream chat template (or harmony preamble, for gpt-oss) actually reads. Missing kwargs left users a passthrough they couldn't reach; this commit wires each one through and extends the parity matrix to assert byte-equality with apply_chat_template. Simple wires (frozenset entry + constructor kwarg + render gate): - GLM-5 / GLM-5.1: clear_thinking preserves the think wrap on past-cycle assistants. Composes with preserve_all_thinking via OR. - Nemotron-3: truncate_history_thinking, same shape, different name. - gpt-oss: conversation_start_date already a constructor kwarg, added to the frozenset so chat_template_kwargs flows through. - Kimi-K2: declared frozenset() explicitly. Template has zero honored kwargs, so the empty set is the audit-correct surface. Renderer features (kwarg plus new behavior in the rendering path): - MiniMax-M2: model_identity replaces the hard-coded default-system fallback. Constructor kwarg renamed from default_system. - Laguna-XS.2: render_assistant_messages_raw adds a passthrough branch. - Qwen3.5 / Qwen3.6 / Qwen3-VL: add_vision_id with per-render image and video counters threaded into emit_image and the bridge variant. Parity test extensions: - _KWARG_VALUES gains entries for every new kwarg. - New shapes: no_system_user_gen and historical_reasoning. - New image-bearing add_vision_id parity test in test_multimodal.py. - test_glm5_constructor_rejects_clear_thinking replaced with positive version; the audit reinstated the kwarg. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit c0384a2. Configure here.
When ``add_vision_id=True``, the renderer prefixes image / video placeholders with ``Picture N:`` / ``Video N:`` where N is a counter running across the whole conversation. The bridge seeds that counter from ``previous_multi_modal_data``; raw prior token ids can't recover it (``<|vision_start|>`` is shared between image and video placeholders so a token-walk can't classify them). If a caller passes ``add_vision_id=True`` but omits ``previous_multi_modal_data`` on a conversation that already contains images, the bridge would silently emit ``Picture 1:`` again — diverging from ``apply_chat_template`` and a full re-render. Refuse the bridge in that case (return None) so the caller falls back to a full re-render, which has the full message list and counts correctly. Adds a regression test that exercises the refusal path and confirms the bridge still proceeds when previous_multi_modal_data IS threaded through. Reported by Cursor Bugbot on c0384a2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure formatting pass via ``uvx ruff format`` over the files modified on this branch. No semantic changes; full test suite still passes (468 tests across the chat_template_kwargs / multimodal / preserve_thinking modules). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the free-form ``chat_template_kwargs: dict[str, Any]`` parameter on ``create_renderer`` with a pydantic discriminated union of per-renderer config classes (``renderers/configs.py``). Each renderer now declares its template knobs as typed fields with ``extra="forbid"``, eliminating the ``CHAT_TEMPLATE_KWARGS`` allowlists and the ad-hoc validation in ``renderers/base.py``. ``DefaultRendererConfig`` keeps ``extra="allow"`` so unknown kwargs flow through to ``apply_chat_template`` for arbitrary HF templates. Renderers store their config on ``self.config`` (no more field shadowing). Auto-resolution carries ``preserve_*_thinking`` flags only — template kwargs require an explicit renderer choice. Includes a design doc at ``docs/renderer-config.md`` covering motivation, OR-composition semantics, and the prime-rl/verifiers integration path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes API examples that survived the typed-config refactor:
- README's top-of-page snippets used ``create_renderer(tok, renderer="auto")``,
which is no longer a valid signature. Replace with the implicit-auto form.
- KimiK25Renderer's docstring referenced ``chat_template_kwargs={"thinking": False}``;
rewrite to ``KimiK25RendererConfig(thinking=False)``.
- ``tests/test_multimodal.py`` referenced the deleted ``CHAT_TEMPLATE_KWARGS``
frozenset in a code comment; update to point at ``KimiK25RendererConfig``.
Also drops dead ``_carry_preserve_flags`` (never imported — ``_resolve_auto``
builds the dict inline) and adds the public ``config_for_name`` to
``configs.py`` ``__all__``.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After merge, references to ``the current PR``, ``chat_template_kwargs``,
``CHAT_TEMPLATE_KWARGS``, ``no longer exposed``, and a "before / after"
migration framing all read as ghosts of a state that never existed in main.
Reframes:
- ``docs/renderer-config.md`` rewritten as a current-state design doc
(discriminated union, auto-resolution, OR-composition, ``_internal_fields``,
tradeoffs). No migration narrative, no PR references.
- ``renderers/qwen36.py`` module docstring drops the "no longer exposed as a
constructor kwarg" framing and describes today's surface directly.
- Test files renamed: ``test_chat_template_kwargs{,_parity}.py`` →
``test_renderer_config{,_parity}.py``. The new names describe what they
test (the typed renderer config and its parity with ``apply_chat_template``)
rather than a kwarg shape that doesn't exist in main.
- Docstrings in the renamed files, ``test_multimodal.py``,
``test_preserve_thinking.py``, and ``test_parse_response_robustness.py``
drop "replaces the old …" / "exercise the chat_template_kwargs flag"
framing.
- Cross-refs in ``configs.py`` and ``test_preserve_thinking.py`` updated to
point at the renamed parity file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drops the tradeoffs / rationale framing and reorganises around what a downstream consumer actually needs: - What ``RendererConfig`` is and how to construct one - Per-renderer config table mapping each variant to its template fields - Auto-resolution rules (carries ``preserve_*`` only) - ``preserve_*`` OR-composition with template toggles - ``DefaultRendererConfig``'s ``extra="allow"`` + Jinja-kwarg passthrough - Downstream integration (single ``RendererConfig`` field in pydantic configs, TOML / YAML deserialisation, ``config_for_name`` helper) - One-line note on the renaming-as-breaking-change constraint No "tradeoffs", "motivation", or "design" sections — informational, not narrative. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Aligns with Python's idiomatic ``from_<source>`` constructor naming (``datetime.fromisoformat``, ``Path.from_uri``, ``dict.fromkeys``). The helper builds a default-valued ``RendererConfig`` from a name string, so ``from_name`` reads as "construct from this representation" where ``for_name`` read as a lookup. Updates the public ``renderers`` export, the ``renderers.configs`` ``__all__``, the design doc snippet, the four test fixture sites that use it, and adds ``.claude/`` to ``.gitignore`` so agent-harness state stays out of the index. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
b970c92 to
0769548
Compare
Aligns the typed-config base with prime-rl's and verifiers' config hierarchies so downstream wrappers (e.g. prime-rl's outer ``RendererConfig(BaseConfig)`` composing ``renderers.RendererConfig`` as ``settings``) share a uniform base. ``BaseConfig`` contributes ``extra="forbid"`` (already what we wanted) and two ``mode="before"`` validators (``"None"`` → ``None`` and stringified-dict coercion) that fire on the outer CLI-parsed config; they no-op on our nested fields. Also drops the leading underscore — ``BaseRendererConfig`` is a reasonable thing to reference for type narrowing in user code, and the discriminated union still gates which variants ``create_renderer`` will accept. Adds ``prime-pydantic-config>=0.3.0.dev0`` to deps (PyPI's latest is ``0.3.0.dev83``) and ``exclude-newer-package`` opt-out so the lockfile picks up the recent dev release. The PyPI build declares only ``pydantic>=2.0.0`` as a runtime requirement — no new transitive deps beyond what we already have. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| "pydantic>=2", | ||
| # ``BaseRendererConfig`` inherits from ``pydantic_config.BaseConfig`` so | ||
| # the typed-config surface stays uniform with prime-rl / verifiers config | ||
| # bases. Pulls in tyro transitively (CLI-parsing helpers used by the |
| # the typed-config surface stays uniform with prime-rl / verifiers config | ||
| # bases. Pulls in tyro transitively (CLI-parsing helpers used by the | ||
| # outer configs in those repos; harmless here). | ||
| "prime-pydantic-config>=0.3.0.dev0", |
There was a problem hiding this comment.
this also seems out out date. @samsja should we do a non-dev release with all the recent fixes
| # union (see ``renderers.configs``). Already transitively present via | ||
| # ``openai-harmony`` / ``transformers``; declared directly because we | ||
| # import it. | ||
| "pydantic>=2", |
There was a problem hiding this comment.
this will be transitive, no?
- pydantic was framed as "transitively present via openai-harmony / transformers" with the direct dep justified as "we import it". With prime-pydantic-config also requiring pydantic, the hedge is doubly redundant — drop it and just say what we use it for. - prime-pydantic-config comment claimed it "pulls in tyro transitively". False against the PyPI release (``0.3.0.dev83``), whose only runtime dep is ``pydantic>=2.0.0``. Drop the line.
``prime-pydantic-config>=0.3.0.dev0`` already pins ``pydantic>=2.0.0`` as its only runtime requirement, which matches the floor we'd declare ourselves. The previous direct declaration was redundant — we don't need a tighter floor than the wrapper enforces, and the wrapper can't function without pydantic so dropping it later isn't realistic.
Makes the intent explicit — we want the newest dev release rather than silently floating on whatever happens to match ``>=0.3.0.dev0``. Latest on PyPI confirmed via the JSON index.

Summary
Replaces the free-form
chat_template_kwargs: dict[str, Any]parameter oncreate_renderer/create_renderer_poolwith a pydantic discriminated union of per-renderer configs (renderers.configs.RendererConfig).extra="forbid", eliminating the per-rendererCHAT_TEMPLATE_KWARGSallowlists and the runtime validation inbase.py.DefaultRendererConfigkeepsextra="allow"so arbitrary Jinja kwargs flow throughmodel_extraintoapply_chat_template.self.config— no field shadowing.AutoRendererConfigresolves viaMODEL_RENDERER_MAPand carries only the sharedpreserve_*flags; template kwargs require an explicit renderer choice.tests/test_renderer_config_parity.pyis auto-derived from each config'stemplate_field_names()× per-field value list, with a coverage assertion that new fields can't be added without a value list.Design rationale:
docs/renderer-config.md— discriminated union, OR-composition ofpreserve_*with template-level toggles (clear_thinking,truncate_history_thinking),_internal_fieldsseparation, tradeoffs.API
Before:
After:
Auto-resolve (typed equivalent of the old
renderer="auto"):Downstream pydantic configs (prime-rl, verifiers) hold a single field typed as
RendererConfig; the discriminator onnameexposes exactly the kwargs that renderer supports and rejects the rest at config-load time.Companion PRs
Both need rebase against the typed config shape — design doc covers the migration path:
Validation
uv run pytest— 1804 passed, 53 skipped, 1 xfailed at HEAD of the refactor commit; targeted re-runs (typed config + preserve thinking + parity coverage) green after the doc-rot sweepuv run ruff check .— cleanNote
Changes since #60 opened
renderers.base.create_rendererandrenderers.base.create_renderer_poolto accept typedRendererConfigobjects instead of string-based renderer names and keyword arguments [d2bcf7e]self.configinstead of individual kwargs [d2bcf7e]config_for_name, renderer-specific config classes, orcreate_rendererdefault auto-resolution [d2bcf7e]RendererConfigwithcreate_rendererandcreate_renderer_poolusage [8c514e0]AutoRendererConfig[8c514e0]preserve_*flags semantics and config immutability [8c514e0]RendererConfigembedding patterns [8c514e0]config_for_namefunction toconfig_from_namein therenderers.configsmodule [a47a0a2]pydantic.BaseModeltopydantic_config.BaseConfig[3dab877]BaseRendererConfiga publicly exported class from therendererspackage [3dab877]prime-pydantic-configdependency to project requirements [3dab877]pydantic>=2as a direct dependency in favor of transitive dependency resolution throughprime-pydantic-config[3e07d7a]prime-pydantic-configdependency [4c9099d]Note
Medium Risk
Breaking public factory API for downstream packages, though behavior is heavily regression-tested; mis-typed configs now fail at load time instead of silently ignoring kwargs.
Overview
This PR replaces string-based
create_renderer/create_renderer_poolarguments (renderer=,tool_parser,preserve_*, loose template kwargs) with a Pydantic discriminated union (RendererConfiginrenderers/configs.py), backed byprime-pydantic-config’sBaseConfig.API: Callers pass one typed config (e.g.
Qwen35RendererConfig(enable_thinking=False)); omitting config is equivalent toAutoRendererConfig(), which still resolves the renderer fromMODEL_RENDERER_MAPand only forwards sharedpreserve_*flags. Per-renderer fields are validated at construction (extra="forbid");DefaultRendererConfigstill allows arbitrary Jinja kwargs viamodel_extra.config_from_name()helps tests/CLIs build default configs from a name string.Implementation: Every hand-coded renderer now takes
configand readsself.configinstead of constructor kwargs. Template behavior that was missing or wrong is wired through the typed fields—e.g. GLM-5clear_thinking, Nemotron-3truncate_history_thinking, Qwen VLadd_vision_id(with bridge guards when prior multimodal metadata is missing), Lagunarender_assistant_messages_raw, and GLM-5.1 empty-think gating withenable_thinking.Docs/tests:
docs/renderer-config.mdand README examples are updated; new parity tests derive coverage fromtemplate_field_names()×_KWARG_VALUES..gitignoreadds.claude/.Reviewed by Cursor Bugbot for commit 4c9099d. Bugbot is set up for automated code reviews on this repo. Configure here.