-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add Index.load() and Index.chunk() methods #8128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
How would this work for For load, I could see
but the pattern for compute is usually:
how would that translate? |
In general the Index method names were chosen after the Dataset methods in which they are called, but maybe |
So if I was a user using CoordinateTransformIndex and I wanted to "load" the transformed values into memory, how would I do that? |
As an end-user you would only need to do It is up to the index to define how to "load" the coordinate values and maybe convert itself. For CoordinateTransfromIndex I see three options:
Option 3 probably makes the most sense if we still need to keep track of the underlying transform. |
I'm not sure we should conflate the two. For example, I could have a dataset with a bunch of chunked arrays and a CoordinateTransformIndex. I might want to load the data into memory, but not realize the lazy coordinates. And conversely, I might want to realize the CoordinateTransform values (say I've subset to a small region), but not load any chunked arrays. I guess (3) is an option, but it's a bit of "action-at-a-distance". What is the most explicit API we can come up with? # assuming RasterIndex over 'x', 'y' dimensions
ds.xindexes.update({"x": ds.xindexes["x"].load()}) # in-place (seems like it has to be) |
I see. Would it be reasonable to add a
I'd avoid Alternatively: loaded_coords = xr.Coordinates.from_xindex(ds.xindexes["x"].load())
ds.coords.update(loaded_coords)
# or
ds = ds.assign_coords(loaded_coords) |
I like |
Assuming a multi-coordinate index like RasterIndex over x/y dimensions, Some possible ways to make it less confusing:
|
whats-new.rst
api.rst
As mentioned in #8124, it gives more control to custom Xarray indexes on what best to do when the Dataset / DataArray
load()
andchunk()
counterpart methods are called.PandasIndex.load()
andPandasIndex.chunk()
always return self (no action required).For a DaskIndex, we might want to return a PandasIndex (or another non-lazy index) from
load()
and rebuild a DaskIndex object fromchunk()
(rechunk).