Skip to content

feat: make transformers and the vLLM client optional dependencies (#31)#70

Open
hallerite wants to merge 5 commits into
mainfrom
optional-transformers
Open

feat: make transformers and the vLLM client optional dependencies (#31)#70
hallerite wants to merge 5 commits into
mainfrom
optional-transformers

Conversation

@hallerite
Copy link
Copy Markdown
Member

@hallerite hallerite commented May 27, 2026

Closes #31.

Why

transformers is a heavy dependency, and downstreams that keep training deps lightweight (e.g. TorchTitan/TorchTune, which load tokenizers via tokenizers) shouldn't have to pull it in just to use a renderer. This makes the heavy/engine-specific pieces opt-in, so the base install is lightweight and text-only renderers work with a bring-your-own tokenizer.

What changed

Tokenizer + Processor protocols (renderers/base.py) — structural types replacing transformers.PreTrainedTokenizer in every renderer's tokenizer/processor annotations. The module-level from transformers... import PreTrainedTokenizer is gone from all 13 renderer modules, so import renderers.<model> no longer drags in transformers.

transformers + fastokens[transformers] extra. Needed only by the convenience helpers (load_tokenizer, create_renderer*), the offset-attribution fallback in attribute_text_segments, and the VLM renderers (image processors). _require_transformers() raises a clear pip install 'renderers[transformers]' error on those lazy paths when it's missing.

renderers.client[vllm] extra. The vLLM /inference/v1/generate client is the only thing needing openai + httpx; it's no longer imported by renderers/__init__ (so import renderers stays free of HTTP/engine deps). OverlongPromptError is now imported from renderers.client (no top-level re-export).

Result — base pip install renderers core deps are just: numpy, tiktoken, jinja2, openai-harmony, prime-pydantic-config. Heavy bits are renderers[transformers] and renderers[vllm] (composable).

Caveats (documented in the README)

A bring-your-own tokenizer must satisfy the Tokenizer protocol (encode/decode/convert_tokens_to_ids/apply_chat_template + name_or_path/unk_token_id/eos_token_id), and per-token training attribution additionally needs tokenizer(..., return_offsets_mapping=True) — without it, attribution falls back to a vanilla HF tokenizer (the extra).

Tests

  • New tests/test_no_transformers.py: subprocess-blocks transformers/fastokens/openai/httpx, then asserts import renderers + a text renderer's render/parse work, that no blocked module leaks into sys.modules, and that load_tokenizer errors with the install hint.
  • Full suite green; ruff + ty clean.

🤖 Generated with Claude Code


Note

Medium Risk
Breaking for consumers that imported OverlongPromptError from renderers or assumed transformers/openai on base install; behavior is documented and guarded with new boundary tests.

Overview
This PR makes the base renderers install lightweight by moving heavy deps behind optional extras and letting text renderers run with a bring-your-own tokenizer.

Packaging: transformers and fastokens are no longer core dependencies; they install via renderers[transformers] (used by load_tokenizer / create_renderer*, offset attribution fallback, and VLMs). openai and httpx move to renderers[vllm] for renderers.client. Dev deps mirror both extras so CI still exercises those paths.

Typing / imports: New Tokenizer, ChatTemplateTokenizer, and Processor protocols in renderers/base.py replace PreTrainedTokenizer on all renderer modules, so importing renderers no longer pulls in transformers. VLMs load AutoProcessor through _require_transformers(), which raises a clear install hint if the extra is missing.

Public API: OverlongPromptError is dropped from renderers top-level exports; use from renderers.client import OverlongPromptError. Tokenizer, ChatTemplateTokenizer, and Processor are exported from the package root.

Tests / docs: tests/test_no_transformers.py subprocess-blocks optional deps and checks text render/parse, no leaked imports, and load_tokenizer errors. README documents extras, protocol expectations, and attribution caveats.

Reviewed by Cursor Bugbot for commit 5062105. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Make transformers and vLLM client optional dependencies in renderers

  • Moves transformers, fastokens, openai, and httpx out of core dependencies into optional extras: renderers[transformers] and renderers[vllm].
  • Introduces Tokenizer, ChatTemplateTokenizer, and Processor protocols in renderers/base.py so renderer classes no longer import from transformers directly.
  • Adds a _require_transformers() helper that raises a clear ImportError with an install hint when transformers is not present and a code path needs it.
  • import renderers no longer pulls openai/httpx into sys.modules; the vLLM client is only loaded on explicit import from renderers.client.
  • Behavioral Change: OverlongPromptError is no longer exported as renderers.OverlongPromptError; callers must import it from renderers.client directly.

Macroscope summarized f19becd.

hallerite and others added 2 commits May 27, 2026 17:09
`transformers` (+ `fastokens`) and the `openai`/`httpx`-based vLLM generate
client are no longer core dependencies. Text-only renderers now work with a
bring-your-own tokenizer and none of the heavy deps installed.

- Add `Tokenizer` + `Processor` structural protocols in `base.py`; type the
  renderer `tokenizer`/`processor` params against them instead of
  `transformers.PreTrainedTokenizer`, so importing a renderer no longer drags
  in `transformers`.
- Move `transformers` + `fastokens` to the `[transformers]` extra and
  `openai` + `httpx` to the `[vllm]` extra. `_require_transformers()` raises a
  clear "install renderers[transformers]" error on the lazy paths
  (`load_tokenizer`, offset attribution, VLM processors).
- `renderers.client` is opt-in: no longer imported by `renderers/__init__`,
  and `OverlongPromptError` moves with it (importable from `renderers.client`).
- Add `tests/test_no_transformers.py` proving text render/parse and
  `import renderers` work with `transformers`/`fastokens`/`openai`/`httpx`
  import-blocked, and that `load_tokenizer` errors clearly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
macroscopeapp[bot]
macroscopeapp Bot previously approved these changes May 27, 2026
@macroscopeapp
Copy link
Copy Markdown

macroscopeapp Bot commented May 27, 2026

Approvability

Verdict: Approved

Mechanical refactoring to make transformers and vLLM client optional dependencies. Type hints are changed from concrete types to protocols, and lazy imports provide clear error messages when extras are missing. Runtime behavior is unchanged when dependencies are installed.

You can customize Macroscope's approvability policy. Learn more.

…e from Tokenizer

Brings in #68 (examples), #69 (harmony floor), #71 (qwen3.5 hard-coded
enable_thinking). The only qwen35.py conflict is resolved by keeping #71's
hard-coded `_ENABLE_THINKING_DEFAULTS` table (no `apply_chat_template`
probe) on top of #31's `Tokenizer`/`Processor` type hints.

Now that #71 removed the last hand-coded-renderer call to
`apply_chat_template`, drop it from the `Tokenizer` protocol so a plain
`tokenizers.Tokenizer` wrapper satisfies it. `apply_chat_template` moves to
a new `ChatTemplateTokenizer(Tokenizer, Protocol)` subtype, required only by
`DefaultRenderer` (the generic chat-template fallback).
macroscopeapp[bot]
macroscopeapp Bot previously approved these changes May 27, 2026
@hallerite hallerite marked this pull request as draft May 27, 2026 21:42
@hallerite
Copy link
Copy Markdown
Member Author

not really satisfied with the way attribute_text_segments for character-offset attribution is handled, so further iterating.

@hallerite hallerite marked this pull request as ready for review June 1, 2026 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Is transformers necessary or tokenizers is enough?

1 participant