-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add pointwise indexing via isel_points method #481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
b080142
ae1f3a8
c3cb3a5
bf0915b
e4851ba
10b67e3
5ab9d4b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -57,6 +57,7 @@ DataArray: | |
|
||
Positional indexing deviates from the NumPy when indexing with multiple | ||
arrays like ``arr[[0, 1], [0, 1]]``, as described in :ref:`indexing details`. | ||
Use :py:meth:`~xray.Dataset.isel_points` to achieve this functionality. | ||
|
||
xray also supports label-based indexing, just like pandas. Because | ||
we use a :py:class:`pandas.Index` under the hood, label based indexing is very | ||
|
@@ -122,7 +123,7 @@ __ http://legacy.python.org/dev/peps/pep-0472/ | |
|
||
.. warning:: | ||
|
||
Do not try to assign values when using ``isel`` or ``sel``:: | ||
Do not try to assign values when using ``isel``, ``isel_points`` or ``sel``:: | ||
|
||
# DO NOT do this | ||
arr.isel(space=0) = 0 | ||
|
@@ -134,6 +135,21 @@ __ http://legacy.python.org/dev/peps/pep-0472/ | |
# this is safe | ||
arr[dict(space=0)] = 0 | ||
|
||
Pointwise indexing | ||
-------------------------------- | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you need to have the same number of dashes as the length of the line above. |
||
|
||
xray pointwise indexing supports the indexing along multiple labeled dimensions | ||
using list-like objects. While :py:meth:`~xray.DataArray.isel` performs | ||
orthogonal indexing, the :py:meth:`~xray.DataArray.isel_points` method | ||
provides similar numpy indexing behavior as if you were using multiple lists to index an array (e.g. `arr[[0, 1], [0, 1]]` ): | ||
|
||
.. ipython:: python | ||
|
||
# index by integer array indices | ||
da = xray.DataArray(np.arange(56).reshape((7, 8)), dims=['x', 'y']) | ||
da | ||
da.isel_points(x=[0, 1, 6], y=[0, 1, 0]) | ||
|
||
Dataset indexing | ||
---------------- | ||
|
||
|
@@ -145,13 +161,16 @@ simultaneously, returning a new dataset: | |
ds = arr.to_dataset() | ||
ds.isel(space=[0], time=[0]) | ||
ds.sel(time='2000-01-01') | ||
ds2 = da.to_dataset() | ||
ds2.isel_points(x=[0, 1, 6], y=[0, 1, 0], dim='points') | ||
|
||
Positional indexing on a dataset is not supported because the ordering of | ||
dimensions in a dataset is somewhat ambiguous (it can vary between different | ||
arrays). However, you can do normal indexing with labeled dimensions: | ||
|
||
.. ipython:: python | ||
|
||
|
||
ds[dict(space=[0], time=[0])] | ||
ds.loc[dict(time='2000-01-01')] | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,6 +14,36 @@ v0.5.3 (unreleased) | |
|
||
- Dataset variables are now written to netCDF files in order of appearance | ||
when using the netcdf4 backend (:issue:`479`). | ||
- Added :py:meth:`~xray.Dataset.isel_points` and :py:meth:`~xray.DataArray.isel_points` to support pointwise indexing of Datasets and DataArrays (:issue:`475`). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A brief example would be nice to add here. You could also link to the new documentation section. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done. |
||
|
||
.. ipython:: | ||
:verbatim: | ||
|
||
In [1]: da = xray.DataArray(np.arange(56).reshape((7, 8)), | ||
dims=['x', 'y']) | ||
|
||
In [2]: da | ||
Out[2]: | ||
<xray.DataArray (x: 7, y: 8)> | ||
array([[ 0, 1, 2, 3, 4, 5, 6, 7], | ||
[ 8, 9, 10, 11, 12, 13, 14, 15], | ||
[16, 17, 18, 19, 20, 21, 22, 23], | ||
[24, 25, 26, 27, 28, 29, 30, 31], | ||
[32, 33, 34, 35, 36, 37, 38, 39], | ||
[40, 41, 42, 43, 44, 45, 46, 47], | ||
[48, 49, 50, 51, 52, 53, 54, 55]]) | ||
Coordinates: | ||
* x (x) int64 0 1 2 3 4 5 6 | ||
* y (y) int64 0 1 2 3 4 5 6 7 | ||
|
||
In [3]: da.isel_points(x=[0, 1, 6], y=[0, 1, 0], dim='points') | ||
Out[3]: | ||
<xray.DataArray (points: 3)> | ||
array([ 0, 9, 48]) | ||
Coordinates: | ||
x (points) int64 0 1 6 | ||
y (points) int64 0 1 0 | ||
* points (points) int64 0 1 2 | ||
|
||
v0.5.2 (16 July 2015) | ||
--------------------- | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
import functools | ||
import warnings | ||
from collections import Mapping | ||
from collections import Mapping, Sequence | ||
from numbers import Number | ||
|
||
import numpy as np | ||
|
@@ -21,6 +21,7 @@ | |
from .variable import as_variable, Variable, Coordinate, broadcast_variables | ||
from .pycompat import (iteritems, itervalues, basestring, OrderedDict, | ||
dask_array_type) | ||
from .combine import concat | ||
|
||
|
||
# list of attributes of pd.DatetimeIndex that are ndarrays of time info | ||
|
@@ -1028,6 +1029,76 @@ def sel(self, method=None, **indexers): | |
return self.isel(**indexing.remap_label_indexers(self, indexers, | ||
method=method)) | ||
|
||
def isel_points(self, dim='points', **indexers): | ||
"""Returns a new dataset with each array indexed pointwise along the | ||
specified dimension(s). | ||
|
||
This method selects pointwise values from each array and is akin to | ||
the NumPy indexing behavior of `arr[[0, 1], [0, 1]]`, except this | ||
method does not require knowing the order of each array's dimensions. | ||
|
||
Parameters | ||
---------- | ||
dim : str or DataArray or pandas.Index, optinal | ||
Name of the dimension to concatenate along. This can either be a | ||
new dimension name, in which case it is added along axis=0, or an | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Existing dimension names are not valid choices for sel_points, I think. Actually, that's probably an edge case worth writing a test for. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You still need to update the docstring here. |
||
existing dimension name, in which case the location of the | ||
dimension is unchanged. If dimension is provided as a DataArray or | ||
Index, its name is used as the dimension to concatenate along and | ||
the values are added as a coordinate. | ||
**indexers : {dim: indexer, ...} | ||
Keyword arguments with names matching dimensions and values given | ||
by array-like objects. All indexers must be the same length and | ||
1 dimensional. | ||
|
||
Returns | ||
------- | ||
obj : Dataset | ||
A new Dataset with the same contents as this dataset, except each | ||
array and dimension is indexed by the appropriate indexers. With | ||
pointwise indexing, the new Dataset will always be a copy of the | ||
original. | ||
|
||
See Also | ||
-------- | ||
Dataset.sel | ||
DataArray.isel | ||
DataArray.sel | ||
DataArray.isel_points | ||
""" | ||
indexer_dims = set(indexers) | ||
|
||
def relevant_keys(mapping): | ||
return [k for k, v in mapping.items() | ||
if any(d in indexer_dims for d in v.dims)] | ||
|
||
data_vars = relevant_keys(self.data_vars) | ||
coords = relevant_keys(self.coords) | ||
|
||
# all the indexers should be iterables | ||
keys = indexers.keys() | ||
indexers = [(k, np.asarray(v)) for k, v in iteritems(indexers)] | ||
# Check that indexers are valid dims, integers, and 1D | ||
for k, v in indexers: | ||
if k not in self.dims: | ||
raise ValueError("dimension %s does not exist" % k) | ||
if v.dtype.kind != 'i': | ||
raise TypeError('Indexers must be integers') | ||
if v.ndim != 1: | ||
raise ValueError('Indexers must be 1 dimensional') | ||
|
||
# all the indexers should have the same length | ||
lengths = set(len(v) for k, v in indexers) | ||
if len(lengths) > 1: | ||
raise ValueError('All indexers must be the same length') | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add a TODO note about speeding this up using vectorized indexing? |
||
# TODO: This would be sped up with vectorized indexing. This will | ||
# require dask to support pointwise indexing as well. | ||
return concat([self.isel(**d) for d in | ||
[dict(zip(keys, inds)) for inds in | ||
zip(*[v for k, v in indexers])]], | ||
dim=dim, coords=coords, data_vars=data_vars) | ||
|
||
def reindex_like(self, other, method=None, copy=True): | ||
"""Conform this object onto the indexes of another object, filling | ||
in missing values with NaN. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -382,6 +382,53 @@ def test_sel_method(self): | |
actual = data.sel(x=[0.9, 1.9], method='backfill') | ||
self.assertDataArrayIdentical(expected, actual) | ||
|
||
def test_isel_points_method(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add a test case for negative indexers? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done. |
||
shape = (10, 5, 6) | ||
np_array = np.random.random(shape) | ||
da = DataArray(np_array, dims=['time', 'y', 'x']) | ||
y = [1, 3] | ||
x = [3, 0] | ||
|
||
expected = da.values[:, y, x] | ||
|
||
actual = da.isel_points(y=y, x=x, dim='test_coord') | ||
assert 'test_coord' in actual.coords | ||
assert actual.coords['test_coord'].shape == (len(y), ) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would also verify that Probably easier just to construct the expected data-array and then compare them with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done. |
||
|
||
actual = da.isel_points(y=y, x=x) | ||
assert 'points' in actual.coords | ||
# Note that because xray always concatenates along the first dimension, | ||
# We must transpose the result to match the numpy style of | ||
# concatentation. | ||
np.testing.assert_equal(actual.T, expected) | ||
|
||
# a few corner cases | ||
da.isel_points(time=[1, 2], x=[2, 2], y=[3, 4]) | ||
np.testing.assert_allclose( | ||
da.isel_points(time=[1], x=[2], y=[4]).values.squeeze(), | ||
np_array[1, 4, 2].squeeze()) | ||
da.isel_points(time=[1, 2]) | ||
|
||
# test that the order of the indexers doesn't matter | ||
self.assertDataArrayIdentical( | ||
da.isel_points(y=y, x=x), | ||
da.isel_points(x=x, y=y)) | ||
|
||
# make sure we're raising errors in the right places | ||
with self.assertRaisesRegexp(ValueError, | ||
'All indexers must be the same length'): | ||
da.isel_points(y=[1, 2], x=[1, 2, 3]) | ||
with self.assertRaisesRegexp(ValueError, | ||
'dimension bad_key does not exist'): | ||
da.isel_points(bad_key=[1, 2]) | ||
with self.assertRaisesRegexp(TypeError, 'Indexers must be integers'): | ||
da.isel_points(y=[1.5, 2.2]) | ||
with self.assertRaisesRegexp(TypeError, 'Indexers must be integers'): | ||
da.isel_points(x=[1, 2, 3], y=slice(3)) | ||
with self.assertRaisesRegexp(ValueError, | ||
'Indexers must be 1 dimensional'): | ||
da.isel_points(y=1, x=2) | ||
|
||
def test_loc(self): | ||
self.ds['x'] = ('x', np.array(list('abcdefghij'))) | ||
da = self.ds['foo'] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would reference the section on pointwise indexing in this same document, e.g., by using
as the link and just before the "Pointwise indexing" section below add: