Skip to content

Shape preserving diff via new keywords #1332

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pwolfram opened this issue Mar 27, 2017 · 10 comments
Closed

Shape preserving diff via new keywords #1332

pwolfram opened this issue Mar 27, 2017 · 10 comments

Comments

@pwolfram
Copy link
Contributor

Currently, an operation such as ds.diff('x') will result in a smaller size dimension, e.g.,

In [1]: import xarray as xr

In [2]: ds = xr.Dataset({'foo': (('x',), [1, 2, 3])}, {'x': [1, 2, 3]})

In [3]: ds
Out[3]: 
<xarray.Dataset>
Dimensions:  (x: 3)
Coordinates:
  * x        (x) int64 1 2 3
Data variables:
    foo      (x) int64 1 2 3

In [4]: ds.diff('x')
Out[4]: 
<xarray.Dataset>
Dimensions:  (x: 2)
Coordinates:
  * x        (x) int64 2 3
Data variables:
    foo      (x) int64 1 1

However, there are cases where the same size would be beneficial to keep so that you would get

In [1]: import xarray as xr

In [2]: ds = xr.Dataset({'foo': (('x',), [1, 2, 3])}, {'x': [1, 2, 3]})

In [3]: ds.diff('x', preserve_shape=True, empty_value=0)
Out[3]: 
<xarray.Dataset>
Dimensions:  (x: 3)
Coordinates:
  * x        (x) int64 1 2 3
Data variables:
    foo      (x) int64 0 1 1

Is there interest in addition of a preserve_shape=True keyword such that it results in this shape-preserving behavior? I'm proposing you could use this with label='upper' and label='lower'.

empty_value could be a value or empty_index could be an index for the fill value. If empty_value=None and empty_index=None, it would produce a nan.

The reason I'm asking the community is because this is at least the second time I've encountered an application where this behavior would be helpful, e.g., computing ocean layer thicknesses from bottom depths. A previous application was computation of a time step from time slice output and the desire to use this product in an approximated integral, e.g.,

y*diff(t, label='lower', preserve_shape=True)

where y and t are both of size n, which is effectively a left-sided Riemann sum.

@shoyer
Copy link
Member

shoyer commented Mar 28, 2017

Would it fill the same need to add a wrapped version of np.gradient, which already works similarly to diff but preserves input shape?

@rabernat
Copy link
Contributor

FYI @pwolfram, if you are interested in more "grid-aware" finite differencing, you might have a look at xgcm. We are working on it again and have implemented some basic difference and interpolation operators:
http://xgcm.readthedocs.io/en/latest/grids.html#grid-objects

@pwolfram
Copy link
Contributor Author

@shoyer, I'm not sure we want to wrap np.gradient. It seems like other approaches like @rabernat 's xgcm would be more appropriate as a superset of xarray.

Fundamentally, I want something that is like an inverse of cumsum and the proposed change could be used in that context. It is just super inconvenient to do array resizing following the diff of a time vector to get timesteps, but maybe this use case is too niche to be useful for the community.

@rabernat
Copy link
Contributor

It is just super inconvenient to do array resizing following the diff of a time vector to get timesteps, but maybe this use case is too niche to be useful for the community.

Maybe for the xarray community, but not the xgcm community ;) We definitely want round-trip-compatibly diff and cumsum operations (see xgcm/xgcm#49, which is very close to implemented)

@pwolfram
Copy link
Contributor Author

@rabernat, do you think that the proposed keyword additions should be included in xarray or not? I personally would like to see them in xarray but don't know if it is just me or not. If you think they should be in xarray, are you ok with the api above?

@spencerahill
Copy link
Contributor

I'm not sure we want to wrap np.gradient. It seems like other approaches like @rabernat 's xgcm would be more appropriate as a superset of xarray.

Certainly grid-aware differencing and integral operators are preferred when the grid information is known and available, but I'm not sure that therefore a more naive version akin to np.gradient would not be useful. It's quite likely that there are xarray users (e.g. in non climate/weather/ocean-related fields) wherein a 'c' grid is meaningless to them, yet they still would appreciate being able to easily compute derivatives via xarray operations.

But then we're back to the valid questions raised before re: what is the appropriate scope of xarray functionality, c.f. #1288 (comment) and subsequent in that thread

@shoyer
Copy link
Member

shoyer commented Mar 28, 2017

As I mentioned in #1288, I think basic functionality like integrate and gradient is totally within appropriate scope for xarray.

I recall now that people have requested similar functionality for numpy.diff: numpy/numpy#8132. It would be nice to resolve this upstream in NumPy first (e.g., with numpy/numpy#8206), and then simply copy the API design in xarray.

@rabernat
Copy link
Contributor

@rabernat, do you think that the proposed keyword additions should be included in xarray or not?

If you are going to have diff return a same-length array, then the keywords should allow the user to specify the boundary condition, e.g. extend, reflect, periodic, blank, etc. I agree this would be useful to have in xarray.

I also agree we should copy numpy wherever possible.

@tomchor
Copy link
Contributor

tomchor commented Sep 3, 2018

I'm not sure where we stand on this issue, but I think since numpy.gradient already exists, it makes more sense to wrap that function for the sake of simplicity instead of making xr.diff differ from the original premise of np.diff.

On the same topic, it bothers me that xr.diff only accepts "upper" and "lower" arguments for the label. The most obvious (and useful) value in my opinion would be "middle", which would correspond to a centered finite differences. Is there any special reason why that option isn't there?

@shoyer
Copy link
Member

shoyer commented Sep 3, 2018 via email

@fujiisoup fujiisoup mentioned this issue Sep 4, 2018
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants