Add Index.load() and Index.chunk() methods #8128

benbovy · 2023-08-31T14:16:27Z

Closes #xxxx
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

As mentioned in #8124, it gives more control to custom Xarray indexes on what best to do when the Dataset / DataArray load() and chunk() counterpart methods are called.

PandasIndex.load() and PandasIndex.chunk() always return self (no action required).

For a DaskIndex, we might want to return a PandasIndex (or another non-lazy index) from load() and rebuild a DaskIndex object from chunk() (rechunk).

benbovy · 2025-04-16T06:48:47Z

Index.compute() might be a possible alternative to Index.load() #6837.

dcherian · 2025-04-16T18:06:34Z

How would this work for compute?

For load, I could see

ds.xindexes["foo"].load()

but the pattern for compute is usually:

ds2 = ds.compute()

how would that translate?

benbovy · 2025-04-16T18:51:41Z

Index.load() has different semantics than Dataset.load(): it returns an index object that will replace the existing index when calling Dataset.load(). The returned index may be self (just propagate the index), a new instance maybe of another type (e.g., convert the index to a PandasIndex) or maybe None (drop the index).

Index.load() (like other core Index API) is not intended to be end-user facing API, it is used internally by Dataset.load(), or Dataset.compute() via Dataset.load().

In general the Index method names were chosen after the Dataset methods in which they are called, but maybe Index.compute() or another name would be less confusing here?

dcherian · 2025-04-16T21:32:32Z

So if I was a user using CoordinateTransformIndex and I wanted to "load" the transformed values into memory, how would I do that?

benbovy · 2025-04-17T07:08:57Z

As an end-user you would only need to do ds.load() or ds.compute() and not care much about anything else.

It is up to the index to define how to "load" the coordinate values and maybe convert itself. For CoordinateTransfromIndex I see three options:

1D index may be converted into a PandasIndex
nD index may be dropped, so Dataset.load() will fallback to Variable.load() for loading the index coordinate data
add a CoordinateTransformIndex.__init__(lazy=True) option that will be used in CoordinateTransformIndex.create_variables() and that will determine the kind of variable to return

Option 3 probably makes the most sense if we still need to keep track of the underlying transform.

dcherian · 2025-04-17T13:13:24Z

I'm not sure we should conflate the two.

For example, I could have a dataset with a bunch of chunked arrays and a CoordinateTransformIndex. I might want to load the data into memory, but not realize the lazy coordinates.

And conversely, I might want to realize the CoordinateTransform values (say I've subset to a small region), but not load any chunked arrays.

I guess (3) is an option, but it's a bit of "action-at-a-distance". What is the most explicit API we can come up with?

# assuming RasterIndex over 'x', 'y' dimensions
ds.xindexes.update({"x": ds.xindexes["x"].load()})  # in-place (seems like it has to be)

benbovy · 2025-04-17T13:38:30Z

I see. Would it be reasonable to add a Dataset.load(load_coords=False) option? And add a Dataset.coords.load() method for the case of loading the coordinates but not the data? This is not the most fined-grained approach but maybe that's enough for most cases?

What is the most explicit API we can come up with?

I'd avoid ds.xindexes.update() as long-term .xindexes might be reduced to a basic mapping of index objects (#9203 (comment)), whereas "loading" the index should also update the index coordinates.

Alternatively:

loaded_coords = xr.Coordinates.from_xindex(ds.xindexes["x"].load())

ds.coords.update(loaded_coords)
# or
ds = ds.assign_coords(loaded_coords)

dcherian · 2025-04-17T14:43:16Z

I like loaded_coords = xr.Coordinates.from_xindex(ds.xindexes["x"].load()) as the explicit API.

benbovy · 2025-04-25T09:32:38Z

Assuming a multi-coordinate index like RasterIndex over x/y dimensions, ds.xindexes["x"].load() may look confusing: what about "y"?

Some possible ways to make it less confusing:

In Xarray update Indexes.__getitem__(self, key) such that key accepts a tuple. This would allow typing ds.xindexes[("x", "y")], which would basically return the same index than ds.xindexes["x"] or ds.xindexes["y"]
3rd-party API such as ds.rasterix.raster_index.load() or ds.rasterix.load_raster_coords()

keewis · 2025-06-13T13:02:41Z

I'm a bit late to this discussion, but some of this reminds me of #8607

benbovy added 2 commits August 31, 2023 15:54

add Index.load and Index.chunk methods

782c14a

refactor Dataset.load

a859ea1

github-actions bot added the topic-indexing label Aug 31, 2023

benbovy added 2 commits August 31, 2023 17:48

tweaks and fixes

d77311d

refactor Dataset.chunk

4506cb6

benbovy mentioned this pull request Aug 31, 2023

More flexible index variables #8124

Draft

4 tasks

benbovy mentioned this pull request Apr 16, 2025

Load raster lazy coordinates dcherian/rasterix#12

Open

benbovy mentioned this pull request Jun 13, 2025

healpix moc index xarray-contrib/xdggs#151

Open

4 tasks

dcherian marked this pull request as ready for review June 13, 2025 13:40

dcherian marked this pull request as draft June 13, 2025 20:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add Index.load() and Index.chunk() methods #8128

Add Index.load() and Index.chunk() methods #8128

Uh oh!

benbovy commented Aug 31, 2023 •

edited

Loading

Uh oh!

benbovy commented Apr 16, 2025

Uh oh!

dcherian commented Apr 16, 2025

Uh oh!

benbovy commented Apr 16, 2025 •

edited

Loading

Uh oh!

dcherian commented Apr 16, 2025

Uh oh!

benbovy commented Apr 17, 2025

Uh oh!

dcherian commented Apr 17, 2025

Uh oh!

benbovy commented Apr 17, 2025

Uh oh!

dcherian commented Apr 17, 2025

Uh oh!

benbovy commented Apr 25, 2025

Uh oh!

keewis commented Jun 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Add Index.load() and Index.chunk() methods #8128

Are you sure you want to change the base?

Add Index.load() and Index.chunk() methods #8128

Uh oh!

Conversation

benbovy commented Aug 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benbovy commented Apr 16, 2025

Uh oh!

dcherian commented Apr 16, 2025

Uh oh!

benbovy commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcherian commented Apr 16, 2025

Uh oh!

benbovy commented Apr 17, 2025

Uh oh!

dcherian commented Apr 17, 2025

Uh oh!

benbovy commented Apr 17, 2025

Uh oh!

dcherian commented Apr 17, 2025

Uh oh!

benbovy commented Apr 25, 2025

Uh oh!

keewis commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

benbovy commented Aug 31, 2023 •

edited

Loading

benbovy commented Apr 16, 2025 •

edited

Loading

keewis commented Jun 13, 2025 •

edited

Loading