Skip to content

feat: reindex multiple DataArrays #4756

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
davidbrochart opened this issue Jan 3, 2021 · 1 comment
Open

feat: reindex multiple DataArrays #4756

davidbrochart opened this issue Jan 3, 2021 · 1 comment

Comments

@davidbrochart
Copy link
Contributor

davidbrochart commented Jan 3, 2021

When e.g. creating a Dataset from multiple DataArrays that are supposed to share the same grid, but are not exactly aligned (as is often the case with floating point coordinates), we usually end up with undesirable NaNs inserted in the data set.
For instance, consider the following data arrays that are not exactly aligned:

import xarray as xr

da1 = xr.DataArray([[0, 1, 2], [3, 4, 5], [6, 7, 8]], coords=[[0, 1, 2], [0, 1, 2]], dims=['x', 'y']).rename('da1')
da2 = xr.DataArray([[0, 1, 2], [3, 4, 5], [6, 7, 8]], coords=[[1.1, 2.1, 3.1], [1.1, 2.1, 3.1]], dims=['x', 'y']).rename('da2')
da1.plot.imshow()
da2.plot.imshow()

image image
They show gaps when combined in a data set:

ds = xr.Dataset({'da1': da1, 'da2': da2})
ds['da1'].plot.imshow()
ds['da2'].plot.imshow()

image image
I think this is a frequent enough situation that we would like a function to re-align all the data arrays together. There is a reindex_like method, which accepts a tolerance, but calling it successively on every data array, like so:

da1r = da1.reindex_like(da2, method='nearest', tolerance=0.2)
da2r = da2.reindex_like(da1r, method='nearest', tolerance=0.2)

would result in the intersection of the coordinates, rather than their union. What I would like is a function like the following:

import numpy as np
from functools import reduce

def reindex_all(arrays, dims, tolerance):
    coords = {}
    for dim in dims:
        coord = reduce(np.union1d, [array[dim] for array in arrays[1:]], arrays[0][dim])
        diff = coord[:-1] - coord[1:]
        keep = np.abs(diff) > tolerance
        coords[dim] = np.append(coord[:-1][keep], coord[-1])
    reindexed = [array.reindex(coords, method='nearest', tolerance=tolerance) for array in arrays]
    return reindexed

da1r, da2r = reindex_all([da1, da2], ['x', 'y'], 0.2)
dsr = xr.Dataset({'da1': da1r, 'da2': da2r})
dsr['da1'].plot.imshow()
dsr['da2'].plot.imshow()

image image
I have not found something equivalent. If you think this is worth it, I could try and send a PR to implement such a feature.

@dcherian
Copy link
Contributor

dcherian commented Jan 3, 2021

See #4489 and #2217 though your request for reindex is slightly different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants