Is your feature request related to a problem?
I am grouping data in a Dataset and computing statistics. I wanted to take the median over (two) groups, but I got the following message:
>>> ds.groupby(['x', 'y']).median()
# NotImplementedError: The da.nanmedian function only works along an axis or a subset of axes. The full algorithm is difficult to do in parallel
while ds.groupby(['x']).median() works without any problem.
I noticed that this issue is because the DataArrays are dask arrays: if they are numpy arrays, there is no problem. In addition, if .median() is replaced by .quantile(0.5), there is no problem either. See below:
import dask.array as da
import numpy as np
import xarray as xr
rng = da.random.default_rng(0)
ds = xr.Dataset(
{'a': (('x', 'y'), rng.random((10, 10)))},
coords={'x': np.arange(5).repeat(2), 'y': np.arange(5).repeat(2)}
)
# Raises:
# NotImplementedError: The da.nanmedian function only works along an axis or a subset of axes. The full algorithm is difficult to do in parallel
try:
ds.groupby(['x', 'y']).median()
except NotImplementedError as e:
print(e)
# No problems with the following:
ds.groupby(['x']).median()
ds.groupby(['x', 'y']).quantile(0.5)
ds.compute().groupby(['x', 'y']).median() # Implicit conversion to numpy array
Describe the solution you'd like
A straightforward solution seems to be to use DatasetGroupBy.quantile(0.5) for DatasetGroupBy.median() if the median is to be computed over multiple groups.
Describe alternatives you've considered
No response
Additional context
My xr.show_versions():
Details
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:06:46) [GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 6.8.0-49-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.3-development
xarray: 2024.10.0
pandas: 2.2.3
numpy: 1.26.4
scipy: 1.14.1
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.4.1
h5py: 3.12.1
zarr: 2.18.3
cftime: 1.6.4.post1
nc_time_axis: None
iris: None
bottleneck: 1.4.2
dask: 2024.11.2
distributed: None
matplotlib: 3.9.2
cartopy: 0.24.0
seaborn: 0.13.2
numbagg: None
fsspec: 2024.10.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 75.5.0
pip: 24.3.1
conda: None
pytest: None
mypy: None
IPython: 8.29.0
sphinx: 7.4.7
Is your feature request related to a problem?
I am grouping data in a Dataset and computing statistics. I wanted to take the median over (two) groups, but I got the following message:
while
ds.groupby(['x']).median()works without any problem.I noticed that this issue is because the DataArrays are dask arrays: if they are numpy arrays, there is no problem. In addition, if
.median()is replaced by.quantile(0.5), there is no problem either. See below:Describe the solution you'd like
A straightforward solution seems to be to use
DatasetGroupBy.quantile(0.5)forDatasetGroupBy.median()if the median is to be computed over multiple groups.Describe alternatives you've considered
No response
Additional context
My
xr.show_versions():Details
INSTALLED VERSIONS ------------------ commit: None python: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:06:46) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 6.8.0-49-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-developmentxarray: 2024.10.0
pandas: 2.2.3
numpy: 1.26.4
scipy: 1.14.1
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.4.1
h5py: 3.12.1
zarr: 2.18.3
cftime: 1.6.4.post1
nc_time_axis: None
iris: None
bottleneck: 1.4.2
dask: 2024.11.2
distributed: None
matplotlib: 3.9.2
cartopy: 0.24.0
seaborn: 0.13.2
numbagg: None
fsspec: 2024.10.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 75.5.0
pip: 24.3.1
conda: None
pytest: None
mypy: None
IPython: 8.29.0
sphinx: 7.4.7