-
-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Public API for buffer objects #2876
base: main
Are you sure you want to change the base?
Public API for buffer objects #2876
Conversation
This moves the public imports from buffer things out of `zarr.core`. Abstract stuff is availble under `zarr.abc.buffer`. Concrete implementations are available under `zarr.buffer.{cpu,gpu}`.
Do you think it's worth just moving the actual code to the new public location, so it doesn't live in |
Mmm, I think I'd still want the implementation to be in a private module so that implementation details like numpy imports don't leak into the public API's namespace. So it'd end up looking pretty similar. And if we're moving stuff out of |
I think this PR takes care of those |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments/suggestions inline.
I also think we should move the code to where we expect users to import it from. This is because:
- having code in
zarr.core.buffer
, which we expcititly mark as private API, could lead developers to think they can make API changes without deprecations. But in reality it's public API. - It forces us to import code in our tests from where users are also importing it
I appreciate it's a bit more work to move the code around, but that's the cost of being a stable and widely used library 😄
And if we're moving stuff out of zarr.core I'd want to do it properly, since I wouldn't be surprised if people were already using it.
We decided that the API in zarr.core
is private, so I think we should feel free to do what we want there - if downstream users are using it then I don't think that's our problem, since we don't document it's existence. On a practical level I can appreciate just moving the code might cause issues, in which case we could still have the code importable from zarr.core
, but issue deprecation warnings before removing it completely.
src/zarr/buffer/__init__.py
Outdated
""" | ||
|
||
from ..core.buffer import default_buffer_prototype | ||
from . import cpu, gpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we import submodules like this anywhere else? If we don't, to be consistent I would remove these imports, and we can always discuss doing this throughout the package in an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like cpu
and gpu
have been removed as of the latest commit? Because we don't do this kind of import elsewhere (e.g., see zarr/api/__init__.py
, that doesn't import synchronous
or asyncrhonous
), I think we should be consistent and not import cpu
or gpu
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I misunderstood your earlier comment.
This is to ease migration for users: Previously import zarr
would make, e.g. zarr.core.buffer.cpu.Buffer
available, as a side-effect of us importing modules internally.
The closest analog here is zarr.storage
, which imports all store subclasses, rather than making users use the full import path. We can't do that exact same thing here hough, since both .gpu
and .cpu
use the same names (Buffer, NDBuffer, etc), so the imports need to be namespaced.
We could just not import cpu
and gpu
and make the user do that import, but in this case I think consistency with zarr.store
is most important.
I don't plan to move the implementation at this time. We already know that some libraries, like icechunk, are depending on it. I don't want to break those implementations because Zarr lacked a public API for this previously. We need to offer a public API first and then give them some time to migrate over. I'll post a proposed plan over in #2621. |
Should be good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As well as my inline comments/suggestions, I noticed that there are parts of the API that are typed with zarr.core.buffer...
- it would be good to change those to use the new public interface. As an example, zarr.abc.codec.ArrayArrayCodec
docs/user-guide/extending.rst
Outdated
@@ -83,7 +83,10 @@ Coming soon. | |||
Custom array buffers | |||
-------------------- | |||
|
|||
Coming soon. | |||
zarr-python provides control where and how arrays stored in memory through |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zarr-python provides control where and how arrays stored in memory through | |
Zarr-Python provides control for where and how arrays stored in memory through |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're inconsistent about zarr-python
vs. Zarr-python
in the docs.
Coming soon. | ||
zarr-python provides control where and how arrays stored in memory through | ||
:mod:`zarr.buffer`. Currently both CPU (the default) and GPU implementations are | ||
provided (see :ref:`user-guide-gpu` for more). You can implement your own buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provided (see :ref:`user-guide-gpu` for more). You can implement your own buffer | |
provided (see :ref:`user-guide-gpu` for more info). You can implement your own buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current version sounds OK.
src/zarr/buffer/__init__.py
Outdated
""" | ||
|
||
from ..core.buffer import default_buffer_prototype | ||
from . import cpu, gpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like cpu
and gpu
have been removed as of the latest commit? Because we don't do this kind of import elsewhere (e.g., see zarr/api/__init__.py
, that doesn't import synchronous
or asyncrhonous
), I think we should be consistent and not import cpu
or gpu
here.
👍 - my suggestion would be to introduce those deprecations in this PR for the buffer stuff that's been made public, but happy to punt that to a later PR if that's easier. |
The latest pair of commits updates our |
Thanks for the review.
This is proving to be a bit more work than I can do right now. Feel free to push a commit if you're able to, or we can merge this as is since and handle that later. |
I'd recommend waiting. A release with this public API will give us a chance to migrate packages like icechunk without their users ever seeing a warning. |
This moves the public imports from buffer things out of
zarr.core.buffer
.zarr.abc.buffer
.zarr.buffer.{cpu,gpu}
.I haven't added any new tests, but I updated the tests in
tests/test_buffer.py
to use the public API.Closes #2871