feat(python): Python binding for iceberg-rust Schema#3
Draft
abnobdoss wants to merge 8 commits into
Draft
Conversation
added 8 commits
May 24, 2026 16:45
Add serde_json dep (needed for from_json/to_json in schema.rs) and register the schema submodule in lib.rs alongside the existing modules.
Parse once via Schema.from_json(); Arc<Schema> shared across callers. Exposes schema_id, highest_field_id, column_names, identifier_field_ids, find_field_by_name, field_by_id, to_json, to_arrow_schema, __arrow_c_schema__ (Arrow PyCapsule Interface), and _capsule (Rust→Rust handoff via PyCapsule named "iceberg_core_schema").
Covers all public methods with full docstrings, including the __arrow_c_schema__ PyCapsule dunder and the _capsule() Rust handoff.
Covers construction (V1/V2 JSON, error cases), all getter methods, case-sensitive field lookup, to_json round-trips, PyCapsule lifecycle, __arrow_c_schema__ PyCapsule Interface, and to_arrow_schema with PARQUET:field_id preservation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Status
Blocked for now while the Predicate binding stack settles. This draft is runtime-only; Python typing stubs are deferred to a separate package-wide follow-up PR.
Summary
Adds a Python binding for
iceberg::spec::Schemaas an opaquepyiceberg_core.schema.Schemahandle constructed from Iceberg schema JSON.Schema.from_json(s)parses V1 or V2 schema JSON and lets iceberg-rust serde enforce schema validityschema_id(),highest_field_id(),column_names(),identifier_field_ids()expose cheap schema metadata; identifier field IDs are returned in ascending orderfind_field_by_name(name)does case-sensitive dotted-path lookup and returns{id, name, type, required}orNonefield_by_id(id)returns the same field dict shape and raisesKeyErrorwhen absentto_json()emits parseable schema JSON for semantic round tripsto_arrow_schema()exports apyarrow.Schema; field IDs are preserved inPARQUET:field_idmetadata__arrow_c_schema__()implements the Arrow PyCapsule Interface with capsule name"arrow_schema"_capsule()returns a PyCapsule named"iceberg_core_schema"wrappingArc<Schema>for future sibling modules in this binding crateThe field dict keeps
typeas the Iceberg spec JSON representation rather than exposing a parallel Python type tree. That keeps this PR focused on an opaque schema handle while preserving enough information for callers that need to inspect a field type.Files changed
bindings/python/src/schema.rs-Schemabinding implementationbindings/python/src/lib.rs- registers the schema submodulebindings/python/Cargo.toml- adds the explicitserde_jsondependency used byfrom_json()andto_json()bindings/python/tests/test_schema.py- schema binding testsVerification
maturin build --release --out dist- clean, zero warningspytest bindings/python/tests/test_schema.py- 32 passedcargo test -p iceberg --lib- 1294 passedDesign notes
Arc<Schema>makes clones cheap and gives_capsule()a clean ownership story: each capsule owns its ownArcclone, so the capsule remains valid after the PythonSchemaobject is dropped.#[pyclass(..., from_py_object)]is included so follow-up methods in this binding crate can acceptSchemaas a typed Python argument under PyO3 0.28 without relying on deprecated implicit extraction behavior.Typing stubs and
py.typedare intentionally deferred to a package-wide typing PR so this feature PR only changes runtime behavior.