codegen(python): drop ReflectapiOption, use model_fields_set for partial fields#151
Conversation
…ial fields
`reflectapi::Option<T>` is a three-state protocol type:
`Some(value)` / explicit `None` / `Undefined` (key absent on the
wire). TypeScript clients express that natively
(`field?: T | null`); Rust clients use the `reflectapi::Option<T>`
enum. The Python client previously shipped a hand-rolled
`ReflectapiOption[T]` wrapper class to carry the same distinction —
but Pydantic already tracks "was this field on the wire?" via
`__pydantic_fields_set__`. The wrapper was duplicating Pydantic's
own bookkeeping and forcing users to learn a parallel
`.unwrap` / `.is_undefined` API.
This rip-out replaces the wrapper with a `ReflectapiPartialModel`
mixin (one `@model_serializer` that filters by `model_fields_set`).
Generated classes that contain at least one `reflectapi::Option<T>`
field inherit from it; field types collapse to plain `T | None`.
API for callers:
# Field omitted on the wire if you don't pass it
Item(name="rex").model_dump_json() == '{"name":"rex"}'
# Field is `null` on the wire if you pass `None` explicitly
Item(name="rex", kind=None).model_dump_json() == '{"name":"rex","kind":null}'
# Was this field on the wire?
"kind" in item.model_fields_set # exact analogue of TS
# `obj.kind !== undefined`
Runtime changes
- New: `reflectapi_runtime.partial.ReflectapiPartialModel`. Exposes
one `@model_serializer(mode="wrap")` that drops keys not in
`model_fields_set`, honouring `by_alias=True`.
- Deleted: `reflectapi_runtime.option` (the entire
`ReflectapiOption` / `Undefined` / `some` / `none` / `undefined` /
`serialize_option_dict` surface).
- `client._serialize_request_body` and `streaming` no longer need
the option-dict dance — they just call
`model.model_dump_json(by_alias=True)` and trust the model.
Codegen changes
- `reflectapi::Option` now maps to `T | None` (was
`ReflectapiOption[T]`); default is `None`.
- `Field.is_partial` and `DataClass.is_partial` track whether any
field originated from `reflectapi::Option<T>`; the renderer picks
`ReflectapiPartialModel` vs `BaseModel` and adds
`validate_assignment=True` to the ConfigDict on partial classes
so post-construction assignments land on the wire.
- New helper `schema_has_partial_field` triggers the
`ReflectapiPartialModel` runtime import.
Tests
- New: `tests/test_partial.py` (21 tests pinning the wire-format
contract — absent vs null vs value, nested partial, container of
models, JSON round-trip, aliases, defaults).
- `tests/test_option.py` deleted.
- `tests/test_edge_cases.py` and `tests/test_enhanced_features.py`
stripped of dead `ReflectapiOption` test classes; the integration
test was rewritten against `ReflectapiPartialModel`.
- Total runtime suite: 347 pass, 0 fail.
- Demo client regenerated; workspace tests + clippy clean.
BREAKING CHANGE: This is a 0.18.x change. Code importing
`ReflectapiOption`, `Undefined`, `some`, `none`, `serialize_option_dict`
from `reflectapi_runtime` will need to migrate to plain `T | None`
fields and query `model_fields_set` for presence.
There was a problem hiding this comment.
Code review is billed via overage credits. To resume reviews, an organization admin can raise the monthly limit at claude.ai/admin-settings/claude-code.
Once credits are available, reopen this pull request to trigger a review.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b2d16c7ce7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
📖 Documentation Preview: https://reflectapi-docs-preview-pr-151.partly.workers.dev Updated automatically from commit 5767f17 |
Four user-reported codegen bugs all share one root cause: the codegen
emitted Python type annotations referencing symbols it didn't actually
define. They went undetected because the generated `_rebuild_models()`
helper wrapped `model_rebuild()` in `try: ... except Exception: pass`,
so dangling references only surfaced when users instantiated affected
models. Fixes are below; the central protection is the strict-rebuild
contract — the entire class of "codegen emitted a dangling reference"
bugs now fails fast at import time and a new CI job pins it.
## The four bugs
1. Namespace alias mismatch (570 model_rebuild failures in core-server)
`myapi/order/__init__.py` aliased classes with the namespace cap
stripped (`InsertData = MyapiOrderInsertData`) while field
annotations elsewhere referenced the un-stripped Rust leaf
(`order.OrderInsertData`). The latter never resolved.
Root cause: `module_aliases()` and `type_name_to_python_ref()`
stripped names with two different rules. Fix: namespaces now expose
*both* the stripped form (ergonomic public name) and the Rust leaf
(matches field annotations). `ModuleType` carries the original Rust
path so the alias generator can compute the leaf.
2. `std.tuple.TupleN[A, B]` with no `std.tuple` class (6 occurrences)
Rust tuples `(A, B)` were rendered as `std.tuple.Tuple2[A, B]` but
no `std.tuple` submodule was ever generated.
Root cause: only `Tuple0` had a mapping; arities 1..16 fell through
to the default dotted-path renderer. Fix: map every `std::tuple::TupleN`
to Python's built-in `tuple` — Pydantic v2 validates
`tuple[A, B, …]` as a fixed-arity heterogeneous tuple, exactly
matching serde's JSON-array wire shape.
3. `std::time::Duration` rendered as `timedelta` (9 occurrences)
serde emits `Duration` as `{"secs": <u64>, "nanos": <u32>}`.
Pydantic's bare `timedelta` validator rejects the dict shape.
Root cause: codegen mapped `std::time::Duration` to `timedelta`
based on the Rust type name without considering serde's actual JSON
shape. Fix: new `ReflectapiDuration` runtime type — an `Annotated`
wrapper around `timedelta` with `BeforeValidator` (accepts `{secs,
nanos}` and existing `timedelta` instances) and `PlainSerializer`
(emits `{secs, nanos}` so the wire round-trips). Microsecond
precision; documented caveat for sub-µs `nanos`.
4. `std.marker.PhantomData[T]` with no `std.marker` class (NEW)
serde renders `PhantomData<T>` as JSON `null`, and reflectapi-derive
surfaces the field in the schema. The codegen rendered an annotation
referencing a non-existent `std.marker.PhantomData` class.
Root cause: PhantomData is a type-system-only marker — it carries no
wire data and shouldn't appear in the Python model at all. Fix: new
`strip_phantom_data_fields()` schema preprocessor drops every
`std::marker::PhantomData<_>` field before rendering. Adds `Fields::retain()`
to `reflectapi-schema` for the surgery.
## Validation harness
The deeper problem was the silent error swallowing in `_rebuild.py`:
try:
_model.model_rebuild()
except Exception:
pass
This pattern made the bugs above survive every PR. The generated
`_rebuild_models()` now:
- Collects every `model_rebuild()` failure into one structured report.
- Raises `RuntimeError` with all failures listed at once.
- Skips non-Pydantic rendered types (enums) so they don't false-positive.
The namespace-level `_rebuild_models()` call's bare `except Exception:`
was narrowed to `except ImportError:` so partial-load races are still
tolerated but rebuild failures propagate.
End-to-end CI guard:
- New `python-codegen-smoke` job in `.github/workflows/ci.yaml`
regenerates the demo Python client and runs `python -c "import
api_client"`. The strict `_rebuild_models()` runs at import time, so
any future dangling reference fails CI immediately. The job also
checks the demo snapshots are committed (no codegen drift).
- New runtime unit tests in `tests/test_codegen_regressions.py` pin
the `ReflectapiDuration` contract (8 tests).
- The demo Rust crate now includes a `codegen_regression` endpoint
whose types exercise all four bug patterns; the schema and demo
client snapshots are regenerated.
## Test plan
- `cargo fmt`, `cargo clippy -p reflectapi --all-features --all-targets
-D warnings`, `cargo test --workspace` — all green.
- `pytest` in `reflectapi-python-runtime` — 355 pass (8 new
regression tests, 0 fail).
- `import api_client` against the regenerated demo — no exceptions,
`RateLimit.model_validate({'retry_after': {'secs': 30, 'nanos': 0}})`
produces `timedelta(seconds=30)`, model_dump_json round-trips back to
`{"secs": 30, "nanos": 0}`.
Adds coverage fixtures to the demo crate that exercise Python codegen edge cases the smoke test would otherwise miss, plus the fixes those fixtures uncovered. Python codegen fixes - Field names exact-matching Pydantic v2 methods (`model_dump`, `model_validate`, `model_config`, etc.) are renamed with a trailing underscore so the methods aren't shadowed at the instance level. - `protected_namespaces=()` is set on every generated `ConfigDict`, letting any user field starting with `model_` exist without Pydantic raising at class construction. - `Vec<u8>` renders as `list[int]` instead of `bytes`. serde's default JSON shape for `Vec<u8>` is a JSON array of integers, not a base64 string; the old mapping made Pydantic's `bytes` validator reject every valid wire payload. OpenAPI codegen fix - Add an in-progress set keyed by full monomorphization so generic self-referential types no longer recurse forever and blow the stack. Non-generic recursion was already handled by the stub-on-insert into `components.schemas`. Coverage fixtures The demo's `coverage` module exercises: - Python keyword field names (`class`, `from`, `import`, `lambda`, `return`, `yield`, `None`, `True`). - Pydantic-reserved field names (`model_config`, `model_fields_set`, `model_dump_json`). - Self-referential and mutually recursive structs. - Generic recursive types instantiated with concrete arguments. - Enums with variants whose names are Python keywords. - HashMap with non-string keys. - `#[serde(transparent)]` newtypes. - Three-state Option wrapping a regular Option. - Field names colliding with imports brought in by the generated module (`BaseModel`, `Field`, `Annotated`, `Generic`, `TypeVar`). - Empty structs. - Docstrings containing quotes, apostrophes, backslashes, and triple quotes. - Type names that shadow Pydantic-imported symbols (`BaseModel`). - The same generic used with multiple concrete type arguments. - Fields with non-`None` serde defaults. The CI smoke test exercises every fixture by regenerating the demo client and importing it; the strict `_rebuild_models()` raises on any dangling type reference. Demo client snapshots, OpenAPI spec, Rust/TS generated clients and the affected Python demo tests are regenerated and updated for the new module names.
The Python codegen shells out to `ruff format` for the generated client, so the in-tree snapshot has line-wrapping that depends on the ruff version. Without pinning, the smoke test's 'no codegen drift' diff check fails any time CI's ruff differs from the developer's local install — which is just version flap, not a real codegen change.
The action runs `ruff check` over the whole repo by default and fails on pre-existing style issues in the runtime tests. We only want ruff on PATH so the codegen can call `ruff format` on its output; pass `args: --version` to short-circuit the lint phase (matches the Rust job's usage).
`reflectapi::Option<T>` renders as `T | None` at every use site; the three states are carried by `model_fields_set` on the containing `ReflectapiPartialModel`. The schema still declares it as a tagged enum, so without this exclusion the codegen emits a parallel `ReflectapiOption` / `ReflectapiOptionSome` / `ReflectapiOptionNone` / `ReflectapiOptionUndefined` set of classes that nothing references.
…rebuild Field annotations across sibling namespaces (e.g. `auth.AuthError` from inside `upload/__init__.py`) need the referenced namespace bound as a module-level global so Pydantic's `model_rebuild()` can resolve the name. The rebuild was previously only binding each namespace's own root, which worked for schemas with a single shared package prefix (the demo's `myapi.<sub>.<sub>` shape) but left sibling references undefined for schemas where every namespace is its own top-level module (the common case for non-prefixed schemas like core-server).
`mangle_monomorphized_name` concatenated arg names with `::` replaced by `_`, so an arg of `uuid::Uuid` produced the suffix `uuid_Uuid` → `UuidUuid` after PascalCase, leaking the namespace cap into the class name (`IdentityDataUuidUuidX` instead of `IdentityDataUuidX`). Run the arg name through the same destutter the rest of the class-name pipeline uses.
Pydantic v2 emits a UserWarning when a field name shadows one of the deprecated v1 BaseModel methods (`schema`, `json`, `dict`, `parse_obj`, etc.). Extend the existing exact-match rename to cover them so generated classes import without warnings.
… fewest escapes Field descriptions used to be pre-escaped for double-quoted wrapping (\" everywhere), then ruff would reflow them to single quotes — leaving literal backslashes in the formatted output (`'... \"NZ\" ...'`). Pick the wrapping quote per literal: prefer double quotes, fall back to single when the content has " but no ', and only escape when both appear.
`parse_doc_attributes` was converting the doc literal via `to_token_stream().to_string()`, which produced the Rust source representation of the string — preserving every `\"` escape from the original source. Doc comments containing literal double quotes therefore landed in the schema as bytes `\"` instead of `"`, and that quoted form propagated into Python field descriptions. Use `Lit::Str::value()` to read the actual string content.
The TypeScript `__EventSourceParserStream` had no `flush()` method,
so a server that closed the connection immediately after the final
`data:` line — without the spec's trailing blank-line terminator —
silently dropped the last event. Add `flush() { parser.feed('\n\n') }`
to the template so the buffered event dispatches on stream close.
The Python runtime already handled the symmetric case via
`feed_eof()` in both `parse_sse` and `aparse_sse`. Add explicit
tests for the async parser eof-flush and for the end-to-end
streaming client receiving a server response that closes
mid-event.
…ge fixtures out of the demo - Restore the historical wire format for plain Pydantic request models: `model_dump_json(by_alias=True, exclude_none=True)` for non-partial models so unset optional fields stay absent on the wire. `ReflectapiPartialModel` still uses no exclude — explicit `None` encodes the protocol's "field present, value cleared" state and must reach the wire. Same fix applied in `client.py` and `streaming.py`. - Move codegen coverage fixtures (PyKeywordFields, recursive types, Pydantic-reserved field names, etc.) out of the public demo schema into the integration test at `reflectapi-demo/tests/codegen_coverage.rs`. The demo crate's `builder()` now only registers pet-themed routes. The integration test builds a private schema, runs Python codegen to `target/codegen-coverage-client/`, and CI imports the result so strict `model_rebuild()` still catches dangling-reference bugs. - Add four runtime tests pinning the partial-vs-plain wire-format contract.
- Gitignore reflectapi-demo/target/ so codegen coverage-client outputs stay out of the index. - Migrate the demo client pyproject from the deprecated `[tool.uv] dev-dependencies = …` to the standard `[dependency-groups] dev = …` shape; clears the uv warning. - Tighten coverage test + CI comments to describe the current behaviour rather than contrast with prior state.
Summary
Drops the hand-rolled
ReflectapiOption[T]wrapper class in favour ofReflectapiPartialModel(aBaseModelmixin that uses Pydantic's ownmodel_fields_set), then hardens the Python codegen against a broad set of edge cases — validated end-to-end against a real-world schema (hundreds of types, dozens of namespaces).Closes #150. Supersedes #149.
Why drop
ReflectapiOption?reflectapi::Option<T>is a three-state protocol type:Some(value)/ explicitNone/Undefined(key absent on the wire). TypeScript clients express it natively (field?: T | null); Rust clients use thereflectapi::Option<T>enum. The Python client used to ship a runtime class wrapper for the same job — but Pydantic already tracks "was this field on the wire?" via__pydantic_fields_set__. The wrapper duplicated Pydantic's bookkeeping, forced users to learn a parallel.unwrap/.is_undefined/.unwrap_or(None)API, and kept producing fragile-edge bugs.After this PR, partial fields are plain
T | Noneand presence is queried at the model level:Runtime changes
reflectapi_runtime.partial.ReflectapiPartialModel—@model_serializer(mode="wrap")filters keys bymodel_fields_set(alias-aware).reflectapi_runtime.duration.ReflectapiDuration—Annotated[timedelta, BeforeValidator, PlainSerializer]adapter for serde's{secs, nanos}wire shape.reflectapi_runtime.option— entireReflectapiOption/Undefined/some/none/undefined/serialize_option_dictsurface.client.py/streaming.pysimplified:model_dump_json(by_alias=True)and trust the model's serializer (partial or plain).partial.pySSE parser tests pin the existingfeed_eof()behaviour for bothparse_sseandaparse_sse, plus end-to-end streaming throughMockTransport.Codegen changes (Python)
reflectapi::Option<T>→T | None(wasReflectapiOption[T]); defaultNone.Field.is_partial/DataClass.is_partialpropagate through the renderer to pickReflectapiPartialModelvsBaseModeland emitvalidate_assignment=Trueon the ConfigDict._rebuild_models(): collects everymodel_rebuild()failure into oneRuntimeError(replacing thetry: … except Exception: passthat hid the original four bug reports for months). Non-Pydantic rendered types (enums) are skipped.Codegen edges hardened in the same pass
order.OrderInsertDataannotationsstd.tuple.TupleNreferences with no classstd::tuple::TupleN<…>to Pythontuple[…](Pydantic v2 validates fixed-arity heterogeneous)std::time::Durationrejected bytimedeltaReflectapiDuration(round-trips{secs, nanos}↔timedelta)std.marker.PhantomData[T]danglingstrip_phantom_data_fields()preprocessor drops every PhantomData fieldmodel_dump,schema,json,dict, …)model_protected_namespaces=()in every generated ConfigDictVec<u8>→bytes(serde wire is[1,2,3], not base64)list[int]ReflectapiOptionADT classes in generated outputreflectapi::Optiontonon_rendered_typesauth.AuthErrorundefined at rebuilduuid::Uuid→UuidUuidin monomorphized names"carried Rust-source\"escapes through to PythonLitStr::value()inreflectapi-derive'… \\"NZ\\" …'outputspython_string_literalpicks the wrapping quote that needs the fewest escapes\n\nflush()to__EventSourceParserStreamValidation
cargo fmt,cargo clippy --all-features --all-targets -D warnings,cargo test --workspace— green.pytestinreflectapi-python-runtime— 355 pass, 0 fail (includes 4 new SSE eof-flush tests).python-codegen-smokeregenerates the demo Python client and imports it; the strict_rebuild_models()raises on any dangling reference, so the class of bug behind the four original reports fails CI at PR time. Job also checks the demo snapshot is committed (no drift). Ruff pinned to 0.15.8 for the formatter call so the drift check is deterministic.Coverage fixtures
The demo's
coveragemodule exercises everything the codegen would otherwise mishandle silently — Python keyword field names, Pydantic-reserved names, self/mutual/generic recursion, enum variants named after keywords, non-string-keyed HashMap,#[serde(transparent)], three-state Option wrapping Option, fields colliding with imports, empty structs, docstrings with quotes/backslashes/triple-quotes, type names shadowing imports, generics with multiple instantiations, fields with non-Noneserde defaults. All round-trip through the CI smoke test.Breaking change — migration
reflectapi-runtimeno longer exportsReflectapiOption,Undefined,Option,some,none,undefined,serialize_option_dict. Generated client code regenerates clean; hand-written code uses these patterns:ReflectapiOption(Undefined)ReflectapiOption(None)field=NoneReflectapiOption(value)field=valueopt.is_undefined"field" not in model.model_fields_setopt.is_nonemodel.field is None and "field" in model.model_fields_setopt.unwrap_or(default)model.field if "field" in model.model_fields_set else defaultFollow-ups (out of this PR)
_client.pyinto per-namespace client files.