Skip to content

Commit a834dde

Browse files
committed
Merge remote-tracking branch 'upstream/master' into groupby-plot
* upstream/master: allow passing any iterable to drop when dropping variables (pydata#3693) Typo on DataSet/DataArray.to_dict documentation (pydata#3692) Fix mypy type checking tests failure in ds.merge (pydata#3690) Explicitly convert result of pd.to_datetime to a timezone-naive type (pydata#3688) ds.merge(da) bugfix (pydata#3677) fix docstring for combine_first: returns a Dataset (pydata#3683) Add option to choose mfdataset attributes source. (pydata#3498) How do I add a new variable to dataset. (pydata#3679) Add map_blocks example to whats-new (pydata#3682) Make dask names change when chunking Variables by different amounts. (pydata#3584) raise an error when renaming dimensions to existing names (pydata#3645) Support swap_dims to dimension names that are not existing variables (pydata#3636) Add map_blocks example to docs. (pydata#3667) add multiindex level name checking to .rename() (pydata#3658)
2 parents cb78770 + e0fd480 commit a834dde

14 files changed

+240
-30
lines changed

doc/data-structures.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -353,6 +353,8 @@ setting) variables and attributes:
353353
This is particularly useful in an exploratory context, because you can
354354
tab-complete these variable names with tools like IPython.
355355

356+
.. _dictionary_like_methods:
357+
356358
Dictionary like methods
357359
~~~~~~~~~~~~~~~~~~~~~~~
358360

doc/howdoi.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ How do I ...
1111

1212
* - How do I...
1313
- Solution
14+
* - add a DataArray to my dataset as a new variable
15+
- ``my_dataset[varname] = my_dataArray`` or :py:meth:`Dataset.assign` (see also :ref:`dictionary_like_methods`)
1416
* - add variables from other datasets to my dataset
1517
- :py:meth:`Dataset.merge`
1618
* - add a new dimension and/or coordinate
@@ -57,3 +59,4 @@ How do I ...
5759
- ``obj.dt.ceil``, ``obj.dt.floor``, ``obj.dt.round``. See :ref:`dt_accessor` for more.
5860
* - make a mask that is ``True`` where an object contains any of the values in a array
5961
- :py:meth:`Dataset.isin`, :py:meth:`DataArray.isin`
62+

doc/whats-new.rst

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,13 +37,20 @@ New Features
3737
- Added the ``count`` reduction method to both :py:class:`~core.rolling.DatasetCoarsen`
3838
and :py:class:`~core.rolling.DataArrayCoarsen` objects. (:pull:`3500`)
3939
By `Deepak Cherian <https://github.com/dcherian>`_
40+
- Add `attrs_file` option in :py:func:`~xarray.open_mfdataset` to choose the
41+
source file for global attributes in a multi-file dataset (:issue:`2382`,
42+
:pull:`3498`) by `Julien Seguinot <https://github.com/juseg>_`.
43+
- :py:meth:`Dataset.swap_dims` and :py:meth:`DataArray.swap_dims`
44+
now allow swapping to dimension names that don't exist yet. (:pull:`3636`)
45+
By `Justus Magin <https://github.com/keewis>`_.
4046
- Extend :py:class:`core.accessor_dt.DatetimeAccessor` properties
4147
and support `.dt` accessor for timedelta
4248
via :py:class:`core.accessor_dt.TimedeltaAccessor` (:pull:`3612`)
4349
By `Anderson Banihirwe <https://github.com/andersy005>`_.
4450

4551
Bug fixes
4652
~~~~~~~~~
53+
4754
- Fix :py:meth:`xarray.combine_by_coords` to allow for combining incomplete
4855
hypercubes of Datasets (:issue:`3648`). By `Ian Bolliger
4956
<https://github.com/bolliger32>`_.
@@ -62,6 +69,16 @@ Bug fixes
6269
By `Tom Augspurger <https://github.com/TomAugspurger>`_.
6370
- Ensure :py:meth:`Dataset.quantile`, :py:meth:`DataArray.quantile` issue the correct error
6471
when ``q`` is out of bounds (:issue:`3634`) by `Mathias Hauser <https://github.com/mathause>`_.
72+
- Raise an error when trying to use :py:meth:`Dataset.rename_dims` to
73+
rename to an existing name (:issue:`3438`, :pull:`3645`)
74+
By `Justus Magin <https://github.com/keewis>`_.
75+
- :py:meth:`Dataset.rename`, :py:meth:`DataArray.rename` now check for conflicts with
76+
MultiIndex level names.
77+
- :py:meth:`Dataset.merge` no longer fails when passed a `DataArray` instead of a `Dataset` object.
78+
By `Tom Nicholas <https://github.com/TomNicholas>`_.
79+
- Fix a regression in :py:meth:`Dataset.drop`: allow passing any
80+
iterable when dropping variables (:issue:`3552`, :pull:`3693`)
81+
By `Justus Magin <https://github.com/keewis>`_.
6582

6683
Documentation
6784
~~~~~~~~~~~~~
@@ -80,9 +97,14 @@ Documentation
8097
- Added examples for :py:meth:`DataArray.quantile`, :py:meth:`Dataset.quantile` and
8198
``GroupBy.quantile``. (:pull:`3576`)
8299
By `Justus Magin <https://github.com/keewis>`_.
100+
- Added example for :py:func:`~xarray.map_blocks`. (:pull:`3667`)
101+
By `Riley X. Brady <https://github.com/bradyrx>`_.
83102

84103
Internal Changes
85104
~~~~~~~~~~~~~~~~
105+
- Make sure dask names change when rechunking by different chunk sizes. Conversely, make sure they
106+
stay the same when rechunking by the same chunk size. (:issue:`3350`)
107+
By `Deepak Cherian <https://github.com/dcherian>`_.
86108
- 2x to 5x speed boost (on small arrays) for :py:meth:`Dataset.isel`,
87109
:py:meth:`DataArray.isel`, and :py:meth:`DataArray.__getitem__` when indexing by int,
88110
slice, list of int, scalar ndarray, or 1-dimensional ndarray.

xarray/backends/api.py

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -718,6 +718,7 @@ def open_mfdataset(
718718
autoclose=None,
719719
parallel=False,
720720
join="outer",
721+
attrs_file=None,
721722
**kwargs,
722723
):
723724
"""Open multiple files as a single dataset.
@@ -729,8 +730,8 @@ def open_mfdataset(
729730
``combine_by_coords`` and ``combine_nested``. By default the old (now deprecated)
730731
``auto_combine`` will be used, please specify either ``combine='by_coords'`` or
731732
``combine='nested'`` in future. Requires dask to be installed. See documentation for
732-
details on dask [1]_. Attributes from the first dataset file are used for the
733-
combined dataset.
733+
details on dask [1]_. Global attributes from the ``attrs_file`` are used
734+
for the combined dataset.
734735
735736
Parameters
736737
----------
@@ -827,6 +828,10 @@ def open_mfdataset(
827828
- 'override': if indexes are of same size, rewrite indexes to be
828829
those of the first object with that dimension. Indexes for the same
829830
dimension must have the same size in all objects.
831+
attrs_file : str or pathlib.Path, optional
832+
Path of the file used to read global attributes from.
833+
By default global attributes are read from the first file provided,
834+
with wildcard matches sorted by filename.
830835
**kwargs : optional
831836
Additional arguments passed on to :py:func:`xarray.open_dataset`.
832837
@@ -961,7 +966,15 @@ def open_mfdataset(
961966
raise
962967

963968
combined._file_obj = _MultiFileCloser(file_objs)
964-
combined.attrs = datasets[0].attrs
969+
970+
# read global attributes from the attrs_file or from the first dataset
971+
if attrs_file is not None:
972+
if isinstance(attrs_file, Path):
973+
attrs_file = str(attrs_file)
974+
combined.attrs = datasets[paths.index(attrs_file)].attrs
975+
else:
976+
combined.attrs = datasets[0].attrs
977+
965978
return combined
966979

967980

xarray/core/dataarray.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1480,8 +1480,7 @@ def swap_dims(self, dims_dict: Mapping[Hashable, Hashable]) -> "DataArray":
14801480
----------
14811481
dims_dict : dict-like
14821482
Dictionary whose keys are current dimension names and whose values
1483-
are new names. Each value must already be a coordinate on this
1484-
array.
1483+
are new names.
14851484
14861485
Returns
14871486
-------
@@ -1504,6 +1503,13 @@ def swap_dims(self, dims_dict: Mapping[Hashable, Hashable]) -> "DataArray":
15041503
Coordinates:
15051504
x (y) <U1 'a' 'b'
15061505
* y (y) int64 0 1
1506+
>>> arr.swap_dims({"x": "z"})
1507+
<xarray.DataArray (z: 2)>
1508+
array([0, 1])
1509+
Coordinates:
1510+
x (z) <U1 'a' 'b'
1511+
y (z) int64 0 1
1512+
Dimensions without coordinates: z
15071513
15081514
See Also
15091515
--------
@@ -2362,7 +2368,7 @@ def to_dict(self, data: bool = True) -> dict:
23622368
naming conventions.
23632369
23642370
Converts all variables and attributes to native Python objects.
2365-
Useful for coverting to json. To avoid datetime incompatibility
2371+
Useful for converting to json. To avoid datetime incompatibility
23662372
use decode_times=False kwarg in xarrray.open_dataset.
23672373
23682374
Parameters

xarray/core/dataset.py

Lines changed: 38 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -85,11 +85,16 @@
8585
either_dict_or_kwargs,
8686
hashable,
8787
is_dict_like,
88-
is_list_like,
8988
is_scalar,
9089
maybe_wrap_array,
9190
)
92-
from .variable import IndexVariable, Variable, as_variable, broadcast_variables
91+
from .variable import (
92+
IndexVariable,
93+
Variable,
94+
as_variable,
95+
broadcast_variables,
96+
assert_unique_multiindex_level_names,
97+
)
9398

9499
if TYPE_CHECKING:
95100
from ..backends import AbstractDataStore, ZarrStore
@@ -1748,7 +1753,10 @@ def maybe_chunk(name, var, chunks):
17481753
if not chunks:
17491754
chunks = None
17501755
if var.ndim > 0:
1751-
token2 = tokenize(name, token if token else var._data)
1756+
# when rechunking by different amounts, make sure dask names change
1757+
# by provinding chunks as an input to tokenize.
1758+
# subtle bugs result otherwise. see GH3350
1759+
token2 = tokenize(name, token if token else var._data, chunks)
17521760
name2 = f"{name_prefix}{name}-{token2}"
17531761
return var.chunk(chunks, name=name2, lock=lock)
17541762
else:
@@ -2780,6 +2788,7 @@ def rename(
27802788
variables, coord_names, dims, indexes = self._rename_all(
27812789
name_dict=name_dict, dims_dict=name_dict
27822790
)
2791+
assert_unique_multiindex_level_names(variables)
27832792
return self._replace(variables, coord_names, dims=dims, indexes=indexes)
27842793

27852794
def rename_dims(
@@ -2791,7 +2800,8 @@ def rename_dims(
27912800
----------
27922801
dims_dict : dict-like, optional
27932802
Dictionary whose keys are current dimension names and
2794-
whose values are the desired names.
2803+
whose values are the desired names. The desired names must
2804+
not be the name of an existing dimension or Variable in the Dataset.
27952805
**dims, optional
27962806
Keyword form of ``dims_dict``.
27972807
One of dims_dict or dims must be provided.
@@ -2809,12 +2819,17 @@ def rename_dims(
28092819
DataArray.rename
28102820
"""
28112821
dims_dict = either_dict_or_kwargs(dims_dict, dims, "rename_dims")
2812-
for k in dims_dict:
2822+
for k, v in dims_dict.items():
28132823
if k not in self.dims:
28142824
raise ValueError(
28152825
"cannot rename %r because it is not a "
28162826
"dimension in this dataset" % k
28172827
)
2828+
if v in self.dims or v in self:
2829+
raise ValueError(
2830+
f"Cannot rename {k} to {v} because {v} already exists. "
2831+
"Try using swap_dims instead."
2832+
)
28182833

28192834
variables, coord_names, sizes, indexes = self._rename_all(
28202835
name_dict={}, dims_dict=dims_dict
@@ -2868,8 +2883,7 @@ def swap_dims(
28682883
----------
28692884
dims_dict : dict-like
28702885
Dictionary whose keys are current dimension names and whose values
2871-
are new names. Each value must already be a variable in the
2872-
dataset.
2886+
are new names.
28732887
28742888
Returns
28752889
-------
@@ -2898,6 +2912,16 @@ def swap_dims(
28982912
Data variables:
28992913
a (y) int64 5 7
29002914
b (y) float64 0.1 2.4
2915+
>>> ds.swap_dims({"x": "z"})
2916+
<xarray.Dataset>
2917+
Dimensions: (z: 2)
2918+
Coordinates:
2919+
x (z) <U1 'a' 'b'
2920+
y (z) int64 0 1
2921+
Dimensions without coordinates: z
2922+
Data variables:
2923+
a (z) int64 5 7
2924+
b (z) float64 0.1 2.4
29012925
29022926
See Also
29032927
--------
@@ -2914,7 +2938,7 @@ def swap_dims(
29142938
"cannot swap from dimension %r because it is "
29152939
"not an existing dimension" % k
29162940
)
2917-
if self.variables[v].dims != (k,):
2941+
if v in self.variables and self.variables[v].dims != (k,):
29182942
raise ValueError(
29192943
"replacement dimension %r is not a 1D "
29202944
"variable along the old dimension %r" % (v, k)
@@ -2923,7 +2947,7 @@ def swap_dims(
29232947
result_dims = {dims_dict.get(dim, dim) for dim in self.dims}
29242948

29252949
coord_names = self._coord_names.copy()
2926-
coord_names.update(dims_dict.values())
2950+
coord_names.update({dim for dim in dims_dict.values() if dim in self.variables})
29272951

29282952
variables: Dict[Hashable, Variable] = {}
29292953
indexes: Dict[Hashable, pd.Index] = {}
@@ -3525,7 +3549,7 @@ def update(self, other: "CoercibleMapping", inplace: bool = None) -> "Dataset":
35253549

35263550
def merge(
35273551
self,
3528-
other: "CoercibleMapping",
3552+
other: Union["CoercibleMapping", "DataArray"],
35293553
inplace: bool = None,
35303554
overwrite_vars: Union[Hashable, Iterable[Hashable]] = frozenset(),
35313555
compat: str = "no_conflicts",
@@ -3582,6 +3606,7 @@ def merge(
35823606
If any variables conflict (see ``compat``).
35833607
"""
35843608
_check_inplace(inplace)
3609+
other = other.to_dataset() if isinstance(other, xr.DataArray) else other
35853610
merge_result = dataset_merge_method(
35863611
self,
35873612
other,
@@ -3664,7 +3689,7 @@ def drop(self, labels=None, dim=None, *, errors="raise", **labels_kwargs):
36643689
raise ValueError("cannot specify dim and dict-like arguments.")
36653690
labels = either_dict_or_kwargs(labels, labels_kwargs, "drop")
36663691

3667-
if dim is None and (is_list_like(labels) or is_scalar(labels)):
3692+
if dim is None and (is_scalar(labels) or isinstance(labels, Iterable)):
36683693
warnings.warn(
36693694
"dropping variables using `drop` will be deprecated; using drop_vars is encouraged.",
36703695
PendingDeprecationWarning,
@@ -4127,7 +4152,7 @@ def combine_first(self, other: "Dataset") -> "Dataset":
41274152
41284153
Returns
41294154
-------
4130-
DataArray
4155+
Dataset
41314156
"""
41324157
out = ops.fillna(self, other, join="outer", dataset_join="outer")
41334158
return out
@@ -4641,7 +4666,7 @@ def to_dict(self, data=True):
46414666
conventions.
46424667
46434668
Converts all variables and attributes to native Python objects
4644-
Useful for coverting to json. To avoid datetime incompatibility
4669+
Useful for converting to json. To avoid datetime incompatibility
46454670
use decode_times=False kwarg in xarrray.open_dataset.
46464671
46474672
Parameters

xarray/core/parallel.py

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,48 @@ def map_blocks(
154154
--------
155155
dask.array.map_blocks, xarray.apply_ufunc, xarray.Dataset.map_blocks,
156156
xarray.DataArray.map_blocks
157+
158+
Examples
159+
--------
160+
161+
Calculate an anomaly from climatology using ``.groupby()``. Using
162+
``xr.map_blocks()`` allows for parallel operations with knowledge of ``xarray``,
163+
its indices, and its methods like ``.groupby()``.
164+
165+
>>> def calculate_anomaly(da, groupby_type='time.month'):
166+
... # Necessary workaround to xarray's check with zero dimensions
167+
... # https://github.com/pydata/xarray/issues/3575
168+
... if sum(da.shape) == 0:
169+
... return da
170+
... gb = da.groupby(groupby_type)
171+
... clim = gb.mean(dim='time')
172+
... return gb - clim
173+
>>> time = xr.cftime_range('1990-01', '1992-01', freq='M')
174+
>>> np.random.seed(123)
175+
>>> array = xr.DataArray(np.random.rand(len(time)),
176+
... dims="time", coords=[time]).chunk()
177+
>>> xr.map_blocks(calculate_anomaly, array).compute()
178+
<xarray.DataArray (time: 24)>
179+
array([ 0.12894847, 0.11323072, -0.0855964 , -0.09334032, 0.26848862,
180+
0.12382735, 0.22460641, 0.07650108, -0.07673453, -0.22865714,
181+
-0.19063865, 0.0590131 , -0.12894847, -0.11323072, 0.0855964 ,
182+
0.09334032, -0.26848862, -0.12382735, -0.22460641, -0.07650108,
183+
0.07673453, 0.22865714, 0.19063865, -0.0590131 ])
184+
Coordinates:
185+
* time (time) object 1990-01-31 00:00:00 ... 1991-12-31 00:00:00
186+
187+
Note that one must explicitly use ``args=[]`` and ``kwargs={}`` to pass arguments
188+
to the function being applied in ``xr.map_blocks()``:
189+
190+
>>> xr.map_blocks(calculate_anomaly, array, kwargs={'groupby_type': 'time.year'})
191+
<xarray.DataArray (time: 24)>
192+
array([ 0.15361741, -0.25671244, -0.31600032, 0.008463 , 0.1766172 ,
193+
-0.11974531, 0.43791243, 0.14197797, -0.06191987, -0.15073425,
194+
-0.19967375, 0.18619794, -0.05100474, -0.42989909, -0.09153273,
195+
0.24841842, -0.30708526, -0.31412523, 0.04197439, 0.0422506 ,
196+
0.14482397, 0.35985481, 0.23487834, 0.12144652])
197+
Coordinates:
198+
* time (time) object 1990-01-31 00:00:00 ... 1991-12-31 00:00:00
157199
"""
158200

159201
def _wrapper(func, obj, to_array, args, kwargs):

xarray/core/utils.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -547,7 +547,12 @@ def __eq__(self, other) -> bool:
547547
return False
548548

549549
def __hash__(self) -> int:
550-
return hash((ReprObject, self._value))
550+
return hash((type(self), self._value))
551+
552+
def __dask_tokenize__(self):
553+
from dask.base import normalize_token
554+
555+
return normalize_token((type(self), self._value))
551556

552557

553558
@contextlib.contextmanager

0 commit comments

Comments
 (0)