Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Public API for buffer objects #2876

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Feb 28, 2025

This moves the public imports from buffer things out of zarr.core.buffer.

  • Abstract stuff is availble under zarr.abc.buffer.
  • Concrete implementations are available under zarr.buffer.{cpu,gpu}.

I haven't added any new tests, but I updated the tests in tests/test_buffer.py to use the public API.

Closes #2871

This moves the public imports from buffer things out of `zarr.core`.

Abstract stuff is availble under `zarr.abc.buffer`.

Concrete implementations are available under `zarr.buffer.{cpu,gpu}`.
@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Feb 28, 2025
@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Mar 1, 2025
@dstansby
Copy link
Contributor

dstansby commented Mar 4, 2025

Do you think it's worth just moving the actual code to the new public location, so it doesn't live in zarr.core.buffer any more? That would make for a simpler code base and be my preference, but I might be missing some reason not to move the code.

@TomAugspurger
Copy link
Contributor Author

Mmm, I think I'd still want the implementation to be in a private module so that implementation details like numpy imports don't leak into the public API's namespace. So it'd end up looking pretty similar. And if we're moving stuff out of zarr.core I'd want to do it properly, since I wouldn't be surprised if people were already using it. IMO not worth it right now.

@TomAugspurger
Copy link
Contributor Author

I wouldn't be surprised if people were already using it.

Yep: https://github.com/earth-mover/icechunk/blob/584d4f2b0aad1160f8e4ba9a48ebbf770c058c00/icechunk-python/python/icechunk/store.py#L13

I think this PR takes care of those zarr.core.buffer imports. IMO, we should ensure that all the types in our public API (like BytesLike) are exported somewhere, but that should be done separately.

Copy link
Contributor

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments/suggestions inline.

I also think we should move the code to where we expect users to import it from. This is because:

  • having code in zarr.core.buffer, which we expcititly mark as private API, could lead developers to think they can make API changes without deprecations. But in reality it's public API.
  • It forces us to import code in our tests from where users are also importing it

I appreciate it's a bit more work to move the code around, but that's the cost of being a stable and widely used library 😄

And if we're moving stuff out of zarr.core I'd want to do it properly, since I wouldn't be surprised if people were already using it.

We decided that the API in zarr.core is private, so I think we should feel free to do what we want there - if downstream users are using it then I don't think that's our problem, since we don't document it's existence. On a practical level I can appreciate just moving the code might cause issues, in which case we could still have the code importable from zarr.core, but issue deprecation warnings before removing it completely.

"""

from ..core.buffer import default_buffer_prototype
from . import cpu, gpu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we import submodules like this anywhere else? If we don't, to be consistent I would remove these imports, and we can always discuss doing this throughout the package in an issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like cpu and gpu have been removed as of the latest commit? Because we don't do this kind of import elsewhere (e.g., see zarr/api/__init__.py, that doesn't import synchronous or asyncrhonous), I think we should be consistent and not import cpu or gpu here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I misunderstood your earlier comment.

This is to ease migration for users: Previously import zarr would make, e.g. zarr.core.buffer.cpu.Buffer available, as a side-effect of us importing modules internally.

The closest analog here is zarr.storage, which imports all store subclasses, rather than making users use the full import path. We can't do that exact same thing here hough, since both .gpu and .cpu use the same names (Buffer, NDBuffer, etc), so the imports need to be namespaced.

We could just not import cpu and gpu and make the user do that import, but in this case I think consistency with zarr.store is most important.

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Mar 28, 2025

I don't plan to move the implementation at this time. We already know that some libraries, like icechunk, are depending on it. I don't want to break those implementations because Zarr lacked a public API for this previously. We need to offer a public API first and then give them some time to migrate over.

I'll post a proposed plan over in #2621.

@TomAugspurger
Copy link
Contributor Author

Should be good to go.

Copy link
Contributor

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As well as my inline comments/suggestions, I noticed that there are parts of the API that are typed with zarr.core.buffer... - it would be good to change those to use the new public interface. As an example, zarr.abc.codec.ArrayArrayCodec

@@ -83,7 +83,10 @@ Coming soon.
Custom array buffers
--------------------

Coming soon.
zarr-python provides control where and how arrays stored in memory through
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
zarr-python provides control where and how arrays stored in memory through
Zarr-Python provides control for where and how arrays stored in memory through

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're inconsistent about zarr-python vs. Zarr-python in the docs.

Coming soon.
zarr-python provides control where and how arrays stored in memory through
:mod:`zarr.buffer`. Currently both CPU (the default) and GPU implementations are
provided (see :ref:`user-guide-gpu` for more). You can implement your own buffer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
provided (see :ref:`user-guide-gpu` for more). You can implement your own buffer
provided (see :ref:`user-guide-gpu` for more info). You can implement your own buffer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current version sounds OK.

"""

from ..core.buffer import default_buffer_prototype
from . import cpu, gpu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like cpu and gpu have been removed as of the latest commit? Because we don't do this kind of import elsewhere (e.g., see zarr/api/__init__.py, that doesn't import synchronous or asyncrhonous), I think we should be consistent and not import cpu or gpu here.

@dstansby
Copy link
Contributor

I don't plan to move the implementation at this time. We already know that some libraries, like icechunk, are depending on it. I don't want to break those implementations because Zarr lacked a public API for this previously. We need to offer a public API first and then give them some time to migrate over.

👍 - my suggestion would be to introduce those deprecations in this PR for the buffer stuff that's been made public, but happy to punt that to a later PR if that's easier.

@TomAugspurger
Copy link
Contributor Author

The latest pair of commits updates our zarr.config, which references zarr.core.buffer. We needed to slightly update our registration code to let the class being registered have a different qualname / import path than where they're defined.

@TomAugspurger
Copy link
Contributor Author

Thanks for the review.

I noticed that there are parts of the API that are typed

This is proving to be a bit more work than I can do right now. Feel free to push a commit if you're able to, or we can merge this as is since and handle that later.

@TomAugspurger
Copy link
Contributor Author

👍 - my suggestion would be to introduce those deprecations in this PR for the buffer stuff that's been made public, but happy to punt that to a later PR if that's easier.

I'd recommend waiting. A release with this public API will give us a chance to migrate packages like icechunk without their users ever seeing a warning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Public API for zarr.core.buffer
2 participants