Skip to content

document dtype extension #3157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Jun 19, 2025

This PR adds a working example of custom dtype creation and registration. because it's a lot of code, I put this in a new top-level directory called examples, which contains the executable python file dtype_example.py. This file uses PEP-723 metadata to declare a ml_dtypes dependency, and it uses a local zarr install, which means it can be tested properly against local changes.

I also expanded the current dtype docs in the user guide to include content about the data type resolution process.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.rst
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jun 19, 2025
@d-v-b
Copy link
Contributor Author

d-v-b commented Jun 19, 2025

cc @nenb @ianhi, since yall were the most involved in this example over in the main dtypes PR.

@dstansby dstansby added this to the 3.1.0 milestone Jun 20, 2025
Copy link
Contributor

@ianhi ianhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is a great improvment. I left some comments and suggested improvements. The other thing I'd wish for is to the format the lines to to 80 characters long. In general I like 100 line length, but when rendered on the docs page you as it currently stands you have to horizontally scroll to read the example code.


class Int2(ZDType[int2_dtype_cls, int2_scalar_cls]):
"""
This class provides a Zarr compatibility layer around the int2 data type and the int2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a nice link explaining the difference between these? I think I've inferred it but would be nice to make it explicit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no I don't actually think there is a nice link that explains the data type / scalar type difference. The numpy docs should explain this, but they don't. I can add something to our docs.


def to_json_scalar(self, data: object, *, zarr_format: ZarrFormat) -> int:
"""Convert a python object to a scalar."""
return int(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be more specific to the example? e.g. explain something to the effect of "needs to be int to be compatible with json." and mention int2 somewhere.

Zarr Python defines a collection of Zarr data types. This collection, called a "data type registry",
is essentially a dict where the keys are strings (a canonical name for each data type), and the values are
the data type classes themselves. Dynamic data type resolution entails iterating over these data
type classes, invoking a special class constructor defined on each one, and returning a concrete
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention the name of method, or link to on ZDType.

@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Jun 23, 2025

Data types in Zarr version 2
-----------------------------

Version 2 of the Zarr format defined its data types relative to
`NumPy's data types <https://numpy.org/doc/2.1/reference/arrays.dtypes.html#data-type-objects-dtype>`_,
and added a few non-NumPy data types as well. Thus the JSON identifier for a NumPy-compatible data
and added a few non-NumPy data types as well. With one exception, the Zarr V2 JSON identifier for a data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and added a few non-NumPy data types as well. With one exception, the Zarr V2 JSON identifier for a data
and added a few non-NumPy data types as well. With one exception (`<the exception>`), the Zarr V2 JSON identifier for a data

otherwise confusing with the mention of two special cases below

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants