Skip to content

Add pointwise indexing via isel_points method #481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 27, 2015
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ Indexing
Dataset.loc
Dataset.isel
Dataset.sel
Dataset.isel_points
Dataset.squeeze
Dataset.reindex
Dataset.reindex_like
Expand Down Expand Up @@ -202,6 +203,7 @@ Indexing
DataArray.loc
DataArray.isel
DataArray.sel
DataArray.isel_points
DataArray.squeeze
DataArray.reindex
DataArray.reindex_like
Expand Down
21 changes: 20 additions & 1 deletion doc/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ DataArray:

Positional indexing deviates from the NumPy when indexing with multiple
arrays like ``arr[[0, 1], [0, 1]]``, as described in :ref:`indexing details`.
Use :py:meth:`~xray.Dataset.isel_points` to achieve this functionality.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would reference the section on pointwise indexing in this same document, e.g., by using

:ref:`pointwise indexing`

as the link and just before the "Pointwise indexing" section below add:

.. _pointwise indexing:


xray also supports label-based indexing, just like pandas. Because
we use a :py:class:`pandas.Index` under the hood, label based indexing is very
Expand Down Expand Up @@ -122,7 +123,7 @@ __ http://legacy.python.org/dev/peps/pep-0472/

.. warning::

Do not try to assign values when using ``isel`` or ``sel``::
Do not try to assign values when using ``isel``, ``isel_points`` or ``sel``::

# DO NOT do this
arr.isel(space=0) = 0
Expand All @@ -134,6 +135,21 @@ __ http://legacy.python.org/dev/peps/pep-0472/
# this is safe
arr[dict(space=0)] = 0

Pointwise indexing
--------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to have the same number of dashes as the length of the line above.


xray pointwise indexing supports the indexing along multiple labeled dimensions
using list-like objects. While :py:meth:`~xray.DataArray.isel` performs
orthogonal indexing, the :py:meth:`~xray.DataArray.isel_points` method
provides similar numpy indexing behavior as if you were using multiple lists to index an array (e.g. `arr[[0, 1], [0, 1]]` ):

.. ipython:: python

# index by integer array indices
da = xray.DataArray(np.arange(56).reshape((7, 8)), dims=['x', 'y'])
da
da.isel_points(x=[0, 1, 6], y=[0, 1, 0])

Dataset indexing
----------------

Expand All @@ -145,13 +161,16 @@ simultaneously, returning a new dataset:
ds = arr.to_dataset()
ds.isel(space=[0], time=[0])
ds.sel(time='2000-01-01')
ds2 = da.to_dataset()
ds2.isel_points(x=[0, 1, 6], y=[0, 1, 0], dim='points')

Positional indexing on a dataset is not supported because the ordering of
dimensions in a dataset is somewhat ambiguous (it can vary between different
arrays). However, you can do normal indexing with labeled dimensions:

.. ipython:: python


ds[dict(space=[0], time=[0])]
ds.loc[dict(time='2000-01-01')]

Expand Down
30 changes: 30 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,36 @@ v0.5.3 (unreleased)

- Dataset variables are now written to netCDF files in order of appearance
when using the netcdf4 backend (:issue:`479`).
- Added :py:meth:`~xray.Dataset.isel_points` and :py:meth:`~xray.DataArray.isel_points` to support pointwise indexing of Datasets and DataArrays (:issue:`475`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A brief example would be nice to add here. You could also link to the new documentation section.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


.. ipython::
:verbatim:

In [1]: da = xray.DataArray(np.arange(56).reshape((7, 8)),
dims=['x', 'y'])

In [2]: da
Out[2]:
<xray.DataArray (x: 7, y: 8)>
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31],
[32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53, 54, 55]])
Coordinates:
* x (x) int64 0 1 2 3 4 5 6
* y (y) int64 0 1 2 3 4 5 6 7

In [3]: da.isel_points(x=[0, 1, 6], y=[0, 1, 0], dim='points')
Out[3]:
<xray.DataArray (points: 3)>
array([ 0, 9, 48])
Coordinates:
x (points) int64 0 1 6
y (points) int64 0 1 0
* points (points) int64 0 1 2

v0.5.2 (16 July 2015)
---------------------
Expand Down
12 changes: 12 additions & 0 deletions xray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -551,6 +551,18 @@ def sel(self, method=None, **indexers):
return self.isel(**indexing.remap_label_indexers(self, indexers,
method=method))

def isel_points(self, dim='points', **indexers):
"""Return a new DataArray whose dataset is given by pointwise integer
indexing along the specified dimension(s).

See Also
--------
Dataset.isel_points
DataArray.sel_points
"""
ds = self._dataset.isel_points(dim=dim, **indexers)
return self._with_replaced_dataset(ds)

def reindex_like(self, other, method=None, copy=True):
"""Conform this object onto the indexes of another object, filling
in missing values with NaN.
Expand Down
73 changes: 72 additions & 1 deletion xray/core/dataset.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import functools
import warnings
from collections import Mapping
from collections import Mapping, Sequence
from numbers import Number

import numpy as np
Expand All @@ -21,6 +21,7 @@
from .variable import as_variable, Variable, Coordinate, broadcast_variables
from .pycompat import (iteritems, itervalues, basestring, OrderedDict,
dask_array_type)
from .combine import concat


# list of attributes of pd.DatetimeIndex that are ndarrays of time info
Expand Down Expand Up @@ -1028,6 +1029,76 @@ def sel(self, method=None, **indexers):
return self.isel(**indexing.remap_label_indexers(self, indexers,
method=method))

def isel_points(self, dim='points', **indexers):
"""Returns a new dataset with each array indexed pointwise along the
specified dimension(s).

This method selects pointwise values from each array and is akin to
the NumPy indexing behavior of `arr[[0, 1], [0, 1]]`, except this
method does not require knowing the order of each array's dimensions.

Parameters
----------
dim : str or DataArray or pandas.Index, optinal
Name of the dimension to concatenate along. This can either be a
new dimension name, in which case it is added along axis=0, or an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing dimension names are not valid choices for sel_points, I think. Actually, that's probably an edge case worth writing a test for.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still need to update the docstring here.

existing dimension name, in which case the location of the
dimension is unchanged. If dimension is provided as a DataArray or
Index, its name is used as the dimension to concatenate along and
the values are added as a coordinate.
**indexers : {dim: indexer, ...}
Keyword arguments with names matching dimensions and values given
by array-like objects. All indexers must be the same length and
1 dimensional.

Returns
-------
obj : Dataset
A new Dataset with the same contents as this dataset, except each
array and dimension is indexed by the appropriate indexers. With
pointwise indexing, the new Dataset will always be a copy of the
original.

See Also
--------
Dataset.sel
DataArray.isel
DataArray.sel
DataArray.isel_points
"""
indexer_dims = set(indexers)

def relevant_keys(mapping):
return [k for k, v in mapping.items()
if any(d in indexer_dims for d in v.dims)]

data_vars = relevant_keys(self.data_vars)
coords = relevant_keys(self.coords)

# all the indexers should be iterables
keys = indexers.keys()
indexers = [(k, np.asarray(v)) for k, v in iteritems(indexers)]
# Check that indexers are valid dims, integers, and 1D
for k, v in indexers:
if k not in self.dims:
raise ValueError("dimension %s does not exist" % k)
if v.dtype.kind != 'i':
raise TypeError('Indexers must be integers')
if v.ndim != 1:
raise ValueError('Indexers must be 1 dimensional')

# all the indexers should have the same length
lengths = set(len(v) for k, v in indexers)
if len(lengths) > 1:
raise ValueError('All indexers must be the same length')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a TODO note about speeding this up using vectorized indexing?

# TODO: This would be sped up with vectorized indexing. This will
# require dask to support pointwise indexing as well.
return concat([self.isel(**d) for d in
[dict(zip(keys, inds)) for inds in
zip(*[v for k, v in indexers])]],
dim=dim, coords=coords, data_vars=data_vars)

def reindex_like(self, other, method=None, copy=True):
"""Conform this object onto the indexes of another object, filling
in missing values with NaN.
Expand Down
47 changes: 47 additions & 0 deletions xray/test/test_dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -382,6 +382,53 @@ def test_sel_method(self):
actual = data.sel(x=[0.9, 1.9], method='backfill')
self.assertDataArrayIdentical(expected, actual)

def test_isel_points_method(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test case for negative indexers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

shape = (10, 5, 6)
np_array = np.random.random(shape)
da = DataArray(np_array, dims=['time', 'y', 'x'])
y = [1, 3]
x = [3, 0]

expected = da.values[:, y, x]

actual = da.isel_points(y=y, x=x, dim='test_coord')
assert 'test_coord' in actual.coords
assert actual.coords['test_coord'].shape == (len(y), )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also verify that x and y are still coordinates, along the test_coord dimension.

Probably easier just to construct the expected data-array and then compare them with self.assertDataArrayIdentical.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


actual = da.isel_points(y=y, x=x)
assert 'points' in actual.coords
# Note that because xray always concatenates along the first dimension,
# We must transpose the result to match the numpy style of
# concatentation.
np.testing.assert_equal(actual.T, expected)

# a few corner cases
da.isel_points(time=[1, 2], x=[2, 2], y=[3, 4])
np.testing.assert_allclose(
da.isel_points(time=[1], x=[2], y=[4]).values.squeeze(),
np_array[1, 4, 2].squeeze())
da.isel_points(time=[1, 2])

# test that the order of the indexers doesn't matter
self.assertDataArrayIdentical(
da.isel_points(y=y, x=x),
da.isel_points(x=x, y=y))

# make sure we're raising errors in the right places
with self.assertRaisesRegexp(ValueError,
'All indexers must be the same length'):
da.isel_points(y=[1, 2], x=[1, 2, 3])
with self.assertRaisesRegexp(ValueError,
'dimension bad_key does not exist'):
da.isel_points(bad_key=[1, 2])
with self.assertRaisesRegexp(TypeError, 'Indexers must be integers'):
da.isel_points(y=[1.5, 2.2])
with self.assertRaisesRegexp(TypeError, 'Indexers must be integers'):
da.isel_points(x=[1, 2, 3], y=slice(3))
with self.assertRaisesRegexp(ValueError,
'Indexers must be 1 dimensional'):
da.isel_points(y=1, x=2)

def test_loc(self):
self.ds['x'] = ('x', np.array(list('abcdefghij')))
da = self.ds['foo']
Expand Down
39 changes: 39 additions & 0 deletions xray/test/test_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -661,6 +661,45 @@ def test_sel(self):
self.assertDatasetEqual(data.isel(td=slice(1, 3)),
data.sel(td=slice('1 days', '2 days')))

def test_isel_points(self):
data = create_test_data()

pdim1 = [1, 2, 3]
pdim2 = [4, 5, 1]
pdim3 = [1, 2, 3]

actual = data.isel_points(dim1=pdim1, dim2=pdim2, dim3=pdim3,
dim='test_coord')
assert 'test_coord' in actual.coords
assert actual.coords['test_coord'].shape == (len(pdim1), )

actual = data.isel_points(dim1=pdim1, dim2=pdim2)
assert 'points' in actual.coords
np.testing.assert_array_equal(pdim1, actual['dim1'])

# test that the order of the indexers doesn't matter
self.assertDatasetIdentical(data.isel_points(dim1=pdim1, dim2=pdim2),
data.isel_points(dim2=pdim2, dim1=pdim1))

# make sure we're raising errors in the right places
with self.assertRaisesRegexp(ValueError,
'All indexers must be the same length'):
data.isel_points(dim1=[1, 2], dim2=[1, 2, 3])
with self.assertRaisesRegexp(ValueError,
'dimension bad_key does not exist'):
data.isel_points(bad_key=[1, 2])
with self.assertRaisesRegexp(TypeError, 'Indexers must be integers'):
data.isel_points(dim1=[1.5, 2.2])
with self.assertRaisesRegexp(TypeError, 'Indexers must be integers'):
data.isel_points(dim1=[1, 2, 3], dim2=slice(3))
with self.assertRaisesRegexp(ValueError,
'Indexers must be 1 dimensional'):
data.isel_points(dim1=1, dim2=2)
# test to be sure we keep around variables that were not indexed
ds = Dataset({'x': [1, 2, 3, 4], 'y': 0})
actual = ds.isel_points(x=[0, 1, 2])
self.assertDataArrayIdentical(ds['y'], actual['y'])

def test_sel_method(self):
data = create_test_data()

Expand Down