fix: Improve `compute_chunksize` for downsampled data #1448

hoxbro · 2025-09-04T18:58:43Z

Previously, the following code would fill out all my memory, and then freeze, ending up with the system either killing the process or I needed to do a manual shutdown.

I haven't done any profiling to see if this affects performance, but at least it doesn't crash my computer anymore. If it does, we can maybe move the functionality into resample_2d_distributed.

import numpy as np
import dask.array as da
import datashader as ds
import xarray as xr
import dask

print(dask.__version__)

# create large dask array
N = 100_000
dask_array = da.random.random((N, N), chunks=(1000, 1000))  # .compute()
# convert to dasked xarray
dask_xarray = xr.DataArray(
    dask_array,
    dims=["x", "y"],
    coords={"x": np.arange(N), "y": np.arange(N)},
    name="example_data",  # Name of the data variable
)
# create plot using Datashader
arr = ds.Canvas(plot_height=300, plot_width=300).raster(dask_xarray)
arr.compute()

First reported here: https://stackoverflow.com/questions/79753007/high-ram-usage-when-using-datashader-with-dasked-xarray

codecov · 2025-09-04T19:48:42Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.34%. Comparing base (f44670c) to head (3863e28).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1448   +/-   ##
=======================================
  Coverage   88.33%   88.34%           
=======================================
  Files          96       96           
  Lines       18901    18905    +4     
=======================================
+ Hits        16696    16701    +5     
+ Misses       2205     2204    -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Nanoputian628 · 2025-09-05T05:15:13Z

Hi, I am the original poster in the Stack Overflow question you have linked. Thank you for looking into this. I installed Datashader from the branch you had created. I ran the same code but I am now getting a different error. See below.

PS: Sorry, not sure how to properly format the error message here.

Traceback (most recent call last)
Cell In[3], line 2
      1 # create plot using Datashader
----> 2 tf.shade(ds.Canvas(plot_height=300, plot_width=300).raster(dask_xarray))

File D:\ProgramData\environments\test_ds\lib\site-packages\datashader\core.py:1155, in Canvas.raster(self, source, layer, upsample_method, downsample_method, nan_value, agg, interpolate, chunksize, max_mem)
   1151         data = resample_2d_distributed(
   1152             source_window, chunksize=chunksize, max_mem=max_mem,
   1153             **kwargs)
   1154     else:
-> 1155         data = resample_2d(source_window, **kwargs)
   1156     layers = 1
   1157 else:

File D:\ProgramData\environments\test_ds\lib\site-packages\datashader\resampling.py:347, in resample_2d(src, w, h, ds_method, us_method, fill_value, mode_rank, x_offset, y_offset, out)
    344 if isinstance(src, np.ma.MaskedArray):
    345     src = src.data
--> 347 resampled = _resample_2d(src, mask, use_mask, ds_method, us_method,
    348                          fill_value, mode_rank, x_offset, y_offset, out)
    349 return _mask_or_not(resampled, src, fill_value)

File D:\ProgramData\environments\test_ds\lib\site-packages\datashader\resampling.py:499, in _resample_2d(src, mask, use_mask, ds_method, us_method, fill_value, mode_rank, x_offset, y_offset, out)
    497 def _resample_2d(src, mask, use_mask, ds_method, us_method, fill_value,
    498                  mode_rank, x_offset, y_offset, out):
--> 499     src_w, src_h, out_w, out_h = _get_dimensions(src, out)
    500     x0_off, x1_off = x_offset
    501     y0_off, y1_off = y_offset

File D:\ProgramData\environments\test_ds\lib\site-packages\numba\core\dispatcher.py:424, in _DispatcherBase._compile_for_args(self, *args, **kws)
    420         msg = (f"{str(e).rstrip()} \n\nThis error may have been caused "
    421                f"by the following argument(s):\n{args_str}\n")
    422         e.patch_message(msg)
--> 424     error_rewrite(e, 'typing')
    425 except errors.UnsupportedError as e:
    426     # Something unsupported is present in the user code, add help info
    427     error_rewrite(e, 'unsupported_error')

File D:\ProgramData\environments\test_ds\lib\site-packages\numba\core\dispatcher.py:365, in _DispatcherBase._compile_for_args.<locals>.error_rewrite(e, issue_type)
    363     raise e
    364 else:
--> 365     raise e.with_traceback(None)

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
non-precise type pyobject
During: typing of argument at D:\ProgramData\environments\test_ds\lib\site-packages\datashader\resampling.py (488)

File "D:\ProgramData\environments\test_ds\lib\site-packages\datashader\resampling.py", line 488:
def _get_fill_value(fill_value, src, out):
    <source elided>

@ngjit
^

During: Pass nopython_type_inference 

This error may have been caused by the following argument(s):
- argument 0: Cannot determine Numba type of <class 'dask.array.core.Array'>

hoxbro · 2025-09-05T08:06:20Z

I cannot recreate.

Looking at the line numbers, it could be because you haven't installed it correctly, installed it into the right environment, or haven't restarted the notebook.

An example is that resampling.py:499 lines text in you traceback does not match up with this branch.

Nanoputian628 · 2025-09-06T00:34:17Z

Sorry, I copied the wrong error message. I had created a new environment where I installed the latest version of all the required packages. I then manually copied over your changes in the resamply.py script (for some reason I couldn't install the Datashader directly from your GitHub branch). When I ran the code I got the same @ngjit decorater error as above. The error message from that run is at the end of this comment.
I also created another new environment with the latest packages and without your change. I ran the same code and I got the same @ngjit decorater error (that error message is the one I mistakenly copied in my earlier message). So this seems to be a separate issue? I will open up a separate ticket for this.

Anyway, I created another new environment and this time I installed Datashader 0.16.1 which is the version that I was using in my original environment. I then manually changed the resamplying.py script for your changes. I can now run the code without any errors. The RAM increases by about 8gb and it takes about 1-2 minutes to run. So happy to confirm that the issue is solved. Thanks for your help!

I just also have some related general questions for my own understanding. Would be much appreciated if you had the time to answer them.

Does the chunk size that I set in my dasked array determine in anyway the chunk size used in Datashader? Based on the error I had and the changes you have made, it seems like Datashader determines its own suitable chunk size?
Depending on the above answer, is there a rough guideline on how much RAM I need available when creating a plot in Datashader using dask? For example, would I need to have 5 times the memory size of a single chunk?
Are you aware of any tutorials that explain how to use Datashader and dask together (even better if it also uses Holoviews/Geoviews)? I have seen a couple of guides, but they are all quite minimal examples and don't go into any details/explanation into how dask is being used. I am very fuzzy with using dask, so it would be great to better understand things such has knowing how much workers and threads to use, how to set various memory limits, etc.

Many thanks!

TypingError                               Traceback (most recent call last)
Cell In[3], line 2
      1 # create plot using Datashader
----> 2 tf.shade(ds.Canvas(plot_height=300, plot_width=300).raster(dask_xarray))

File D:\ProgramData\environments\ds_test\lib\site-packages\datashader\core.py:1155, in Canvas.raster(self, source, layer, upsample_method, downsample_method, nan_value, agg, interpolate, chunksize, max_mem)
   1151         data = resample_2d_distributed(
   1152             source_window, chunksize=chunksize, max_mem=max_mem,
   1153             **kwargs)
   1154     else:
-> 1155         data = resample_2d(source_window, **kwargs)
   1156     layers = 1
   1157 else:

File D:\ProgramData\environments\ds_test\lib\site-packages\datashader\resampling.py:353, in resample_2d(src, w, h, ds_method, us_method, fill_value, mode_rank, x_offset, y_offset, out)
    350 if isinstance(src, np.ma.MaskedArray):
    351     src = src.data
--> 353 resampled = _resample_2d(src, mask, use_mask, ds_method, us_method,
    354                          fill_value, mode_rank, x_offset, y_offset, out)
    355 return _mask_or_not(resampled, src, fill_value)

File D:\ProgramData\environments\ds_test\lib\site-packages\datashader\resampling.py:505, in _resample_2d(src, mask, use_mask, ds_method, us_method, fill_value, mode_rank, x_offset, y_offset, out)
    503 def _resample_2d(src, mask, use_mask, ds_method, us_method, fill_value,
    504                  mode_rank, x_offset, y_offset, out):
--> 505     src_w, src_h, out_w, out_h = _get_dimensions(src, out)
    506     x0_off, x1_off = x_offset
    507     y0_off, y1_off = y_offset

File D:\ProgramData\environments\ds_test\lib\site-packages\numba\core\dispatcher.py:424, in _DispatcherBase._compile_for_args(self, *args, **kws)
    420         msg = (f"{str(e).rstrip()} \n\nThis error may have been caused "
    421                f"by the following argument(s):\n{args_str}\n")
    422         e.patch_message(msg)
--> 424     error_rewrite(e, 'typing')
    425 except errors.UnsupportedError as e:
    426     # Something unsupported is present in the user code, add help info
    427     error_rewrite(e, 'unsupported_error')

File D:\ProgramData\environments\ds_test\lib\site-packages\numba\core\dispatcher.py:365, in _DispatcherBase._compile_for_args.<locals>.error_rewrite(e, issue_type)
    363     raise e
    364 else:
--> 365     raise e.with_traceback(None)

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
non-precise type pyobject
During: typing of argument at D:\ProgramData\environments\ds_test\lib\site-packages\datashader\resampling.py (494)

File "D:\ProgramData\environments\ds_test\lib\site-packages\datashader\resampling.py", line 494:
def _get_fill_value(fill_value, src, out):
    <source elided>

@ngjit
^

During: Pass nopython_type_inference 

This error may have been caused by the following argument(s):
- argument 0: Cannot determine Numba type of <class 'dask.array.core.Array'>

holovizbot · 2025-09-07T18:52:37Z

This pull request has been mentioned on HoloViz Discourse. There might be relevant details there:

https://discourse.holoviz.org/t/error-when-creating-plot-from-dask-xarray-using-latest-package-versions/8926/1

hoxbro · 2025-09-08T06:57:41Z

I also created another new environment with the latest packages and without your change. I ran the same code and I got the same @ngjit decorater error (that error message is the one I mistakenly copied in my earlier message). So this seems to be a separate issue? I will open up a separate ticket for this.

I can recreate the issue with uv and will investigate; I'm not entirely sure what is causing this problem. Can you open an issue, so we don't forget about it, as it is out of scope for this PR.

uv venv --python 3.13
uv pip install dask datashader

For your questions, dask should generally be smart enough to max out your memory. Not sure if the problem lies in datashader or dask, but what I'm doing so far in the PR is changing the chunk size to avoid the memory.

codspeed-hq · 2025-09-08T06:59:33Z

CodSpeed Instrumentation Performance Report

Merging #1448 will improve performances by 11.6%

_{Comparing fix_dont_crash (3863e28) with main (f44670c)}

Summary

⚡ 2 improvements
✅ 41 untouched benchmarks

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
⚡	`test_quadmesh_raster[256]`	15.9 ms	14.3 ms	+11.57%
⚡	`test_dask_raster[8192]`	3.9 s	3.5 s	+11.6%

hoxbro · 2025-09-08T07:06:57Z

I can recreate the issue with uv and will investigate; I'm not entirely sure what is causing this problem. Can you open an issue, so we don't forget about it, as it is out of scope for this PR.

It is because you also need to install pyarrow. I will consider how to improve this information, as it is not obvious.

hoxbro · 2025-09-08T07:20:11Z

datashader/resampling.py

+    sh, sw = src.shape
+    height_fraction, width_fraction = sh / h, sw / w
+    # For downsampling, use smaller chunks to reduce memory usage
+    if chunksize is None and (w < sw or h < sh):


Note: We need to update the docstring to reflect this.

fix: Don't crash computer

a0b4422

hoxbro changed the title ~~fix: Improve compute_chunksize for~~ fix: Improve compute_chunksize for downsample Sep 4, 2025

hoxbro changed the title ~~fix: Improve compute_chunksize for downsample~~ fix: Improve compute_chunksize for downsampled data Sep 4, 2025

Use truediv

ed2f7a9

hoxbro mentioned this pull request Sep 5, 2025

test: Add benchmark to CI #1450

Merged

hoxbro and others added 2 commits September 8, 2025 07:56

Merge branch 'main' into fix_dont_crash

df346cc

convert back to int

ff1e4c3

hoxbro commented Sep 8, 2025

View reviewed changes

hoxbro mentioned this pull request Sep 8, 2025

fix: Warn if dask is installed without pyarrow #1452

Closed

hoxbro marked this pull request as ready for review September 8, 2025 07:43

Merge branch 'main' into fix_dont_crash

fe3af17

hoxbro requested a review from philippjfr September 8, 2025 09:23

hoxbro added 2 commits September 10, 2025 12:20

Merge branch 'main' into fix_dont_crash

c17693b

Merge branch 'main' into fix_dont_crash

3863e28

hoxbro added this to the v0.19.0 milestone Nov 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: Improve `compute_chunksize` for downsampled data #1448

fix: Improve `compute_chunksize` for downsampled data #1448

Uh oh!

hoxbro commented Sep 4, 2025 •

edited

Loading

Uh oh!

codecov bot commented Sep 4, 2025 •

edited

Loading

Uh oh!

Nanoputian628 commented Sep 5, 2025 •

edited by hoxbro

Loading

Uh oh!

hoxbro commented Sep 5, 2025

Uh oh!

Nanoputian628 commented Sep 6, 2025

Uh oh!

holovizbot commented Sep 7, 2025

Uh oh!

hoxbro commented Sep 8, 2025

Uh oh!

codspeed-hq bot commented Sep 8, 2025 •

edited

Loading

Uh oh!

hoxbro commented Sep 8, 2025 •

edited

Loading

Uh oh!

hoxbro Sep 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

fix: Improve compute_chunksize for downsampled data #1448

Are you sure you want to change the base?

fix: Improve compute_chunksize for downsampled data #1448

Uh oh!

Conversation

hoxbro commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Nanoputian628 commented Sep 5, 2025 • edited by hoxbro Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hoxbro commented Sep 5, 2025

Uh oh!

Nanoputian628 commented Sep 6, 2025

Uh oh!

holovizbot commented Sep 7, 2025

Uh oh!

hoxbro commented Sep 8, 2025

Uh oh!

codspeed-hq bot commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Instrumentation Performance Report

Merging #1448 will improve performances by 11.6%

Summary

Benchmarks breakdown

Uh oh!

hoxbro commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hoxbro Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: Improve `compute_chunksize` for downsampled data #1448

fix: Improve `compute_chunksize` for downsampled data #1448

hoxbro commented Sep 4, 2025 •

edited

Loading

codecov bot commented Sep 4, 2025 •

edited

Loading

Nanoputian628 commented Sep 5, 2025 •

edited by hoxbro

Loading

codspeed-hq bot commented Sep 8, 2025 •

edited

Loading

hoxbro commented Sep 8, 2025 •

edited

Loading