Skip to content

Conversation

@hoxbro
Copy link
Member

@hoxbro hoxbro commented Sep 4, 2025

Previously, the following code would fill out all my memory, and then freeze, ending up with the system either killing the process or I needed to do a manual shutdown.

I haven't done any profiling to see if this affects performance, but at least it doesn't crash my computer anymore. If it does, we can maybe move the functionality into resample_2d_distributed.

import numpy as np
import dask.array as da
import datashader as ds
import xarray as xr
import dask

print(dask.__version__)

# create large dask array
N = 100_000
dask_array = da.random.random((N, N), chunks=(1000, 1000))  # .compute()
# convert to dasked xarray
dask_xarray = xr.DataArray(
    dask_array,
    dims=["x", "y"],
    coords={"x": np.arange(N), "y": np.arange(N)},
    name="example_data",  # Name of the data variable
)
# create plot using Datashader
arr = ds.Canvas(plot_height=300, plot_width=300).raster(dask_xarray)
arr.compute()

First reported here: https://stackoverflow.com/questions/79753007/high-ram-usage-when-using-datashader-with-dasked-xarray

@hoxbro hoxbro changed the title fix: Improve compute_chunksize for fix: Improve compute_chunksize for downsample Sep 4, 2025
@hoxbro hoxbro changed the title fix: Improve compute_chunksize for downsample fix: Improve compute_chunksize for downsampled data Sep 4, 2025
@codecov
Copy link

codecov bot commented Sep 4, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.34%. Comparing base (f44670c) to head (3863e28).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1448   +/-   ##
=======================================
  Coverage   88.33%   88.34%           
=======================================
  Files          96       96           
  Lines       18901    18905    +4     
=======================================
+ Hits        16696    16701    +5     
+ Misses       2205     2204    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Nanoputian628
Copy link

Nanoputian628 commented Sep 5, 2025

Hi, I am the original poster in the Stack Overflow question you have linked. Thank you for looking into this. I installed Datashader from the branch you had created. I ran the same code but I am now getting a different error. See below.

PS: Sorry, not sure how to properly format the error message here.

Traceback (most recent call last)
Cell In[3], line 2
      1 # create plot using Datashader
----> 2 tf.shade(ds.Canvas(plot_height=300, plot_width=300).raster(dask_xarray))

File D:\ProgramData\environments\test_ds\lib\site-packages\datashader\core.py:1155, in Canvas.raster(self, source, layer, upsample_method, downsample_method, nan_value, agg, interpolate, chunksize, max_mem)
   1151         data = resample_2d_distributed(
   1152             source_window, chunksize=chunksize, max_mem=max_mem,
   1153             **kwargs)
   1154     else:
-> 1155         data = resample_2d(source_window, **kwargs)
   1156     layers = 1
   1157 else:

File D:\ProgramData\environments\test_ds\lib\site-packages\datashader\resampling.py:347, in resample_2d(src, w, h, ds_method, us_method, fill_value, mode_rank, x_offset, y_offset, out)
    344 if isinstance(src, np.ma.MaskedArray):
    345     src = src.data
--> 347 resampled = _resample_2d(src, mask, use_mask, ds_method, us_method,
    348                          fill_value, mode_rank, x_offset, y_offset, out)
    349 return _mask_or_not(resampled, src, fill_value)

File D:\ProgramData\environments\test_ds\lib\site-packages\datashader\resampling.py:499, in _resample_2d(src, mask, use_mask, ds_method, us_method, fill_value, mode_rank, x_offset, y_offset, out)
    497 def _resample_2d(src, mask, use_mask, ds_method, us_method, fill_value,
    498                  mode_rank, x_offset, y_offset, out):
--> 499     src_w, src_h, out_w, out_h = _get_dimensions(src, out)
    500     x0_off, x1_off = x_offset
    501     y0_off, y1_off = y_offset

File D:\ProgramData\environments\test_ds\lib\site-packages\numba\core\dispatcher.py:424, in _DispatcherBase._compile_for_args(self, *args, **kws)
    420         msg = (f"{str(e).rstrip()} \n\nThis error may have been caused "
    421                f"by the following argument(s):\n{args_str}\n")
    422         e.patch_message(msg)
--> 424     error_rewrite(e, 'typing')
    425 except errors.UnsupportedError as e:
    426     # Something unsupported is present in the user code, add help info
    427     error_rewrite(e, 'unsupported_error')

File D:\ProgramData\environments\test_ds\lib\site-packages\numba\core\dispatcher.py:365, in _DispatcherBase._compile_for_args.<locals>.error_rewrite(e, issue_type)
    363     raise e
    364 else:
--> 365     raise e.with_traceback(None)

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
non-precise type pyobject
During: typing of argument at D:\ProgramData\environments\test_ds\lib\site-packages\datashader\resampling.py (488)

File "D:\ProgramData\environments\test_ds\lib\site-packages\datashader\resampling.py", line 488:
def _get_fill_value(fill_value, src, out):
    <source elided>

@ngjit
^

During: Pass nopython_type_inference 

This error may have been caused by the following argument(s):
- argument 0: Cannot determine Numba type of <class 'dask.array.core.Array'>

@hoxbro
Copy link
Member Author

hoxbro commented Sep 5, 2025

I cannot recreate.

image

Looking at the line numbers, it could be because you haven't installed it correctly, installed it into the right environment, or haven't restarted the notebook.

An example is that resampling.py:499 lines text in you traceback does not match up with this branch.

@hoxbro hoxbro mentioned this pull request Sep 5, 2025
@Nanoputian628
Copy link

Sorry, I copied the wrong error message. I had created a new environment where I installed the latest version of all the required packages. I then manually copied over your changes in the resamply.py script (for some reason I couldn't install the Datashader directly from your GitHub branch). When I ran the code I got the same @ngjit decorater error as above. The error message from that run is at the end of this comment.
I also created another new environment with the latest packages and without your change. I ran the same code and I got the same @ngjit decorater error (that error message is the one I mistakenly copied in my earlier message). So this seems to be a separate issue? I will open up a separate ticket for this.

Anyway, I created another new environment and this time I installed Datashader 0.16.1 which is the version that I was using in my original environment. I then manually changed the resamplying.py script for your changes. I can now run the code without any errors. The RAM increases by about 8gb and it takes about 1-2 minutes to run. So happy to confirm that the issue is solved. Thanks for your help!

I just also have some related general questions for my own understanding. Would be much appreciated if you had the time to answer them.

  1. Does the chunk size that I set in my dasked array determine in anyway the chunk size used in Datashader? Based on the error I had and the changes you have made, it seems like Datashader determines its own suitable chunk size?
  2. Depending on the above answer, is there a rough guideline on how much RAM I need available when creating a plot in Datashader using dask? For example, would I need to have 5 times the memory size of a single chunk?
  3. Are you aware of any tutorials that explain how to use Datashader and dask together (even better if it also uses Holoviews/Geoviews)? I have seen a couple of guides, but they are all quite minimal examples and don't go into any details/explanation into how dask is being used. I am very fuzzy with using dask, so it would be great to better understand things such has knowing how much workers and threads to use, how to set various memory limits, etc.

Many thanks!

TypingError                               Traceback (most recent call last)
Cell In[3], line 2
      1 # create plot using Datashader
----> 2 tf.shade(ds.Canvas(plot_height=300, plot_width=300).raster(dask_xarray))

File D:\ProgramData\environments\ds_test\lib\site-packages\datashader\core.py:1155, in Canvas.raster(self, source, layer, upsample_method, downsample_method, nan_value, agg, interpolate, chunksize, max_mem)
   1151         data = resample_2d_distributed(
   1152             source_window, chunksize=chunksize, max_mem=max_mem,
   1153             **kwargs)
   1154     else:
-> 1155         data = resample_2d(source_window, **kwargs)
   1156     layers = 1
   1157 else:

File D:\ProgramData\environments\ds_test\lib\site-packages\datashader\resampling.py:353, in resample_2d(src, w, h, ds_method, us_method, fill_value, mode_rank, x_offset, y_offset, out)
    350 if isinstance(src, np.ma.MaskedArray):
    351     src = src.data
--> 353 resampled = _resample_2d(src, mask, use_mask, ds_method, us_method,
    354                          fill_value, mode_rank, x_offset, y_offset, out)
    355 return _mask_or_not(resampled, src, fill_value)

File D:\ProgramData\environments\ds_test\lib\site-packages\datashader\resampling.py:505, in _resample_2d(src, mask, use_mask, ds_method, us_method, fill_value, mode_rank, x_offset, y_offset, out)
    503 def _resample_2d(src, mask, use_mask, ds_method, us_method, fill_value,
    504                  mode_rank, x_offset, y_offset, out):
--> 505     src_w, src_h, out_w, out_h = _get_dimensions(src, out)
    506     x0_off, x1_off = x_offset
    507     y0_off, y1_off = y_offset

File D:\ProgramData\environments\ds_test\lib\site-packages\numba\core\dispatcher.py:424, in _DispatcherBase._compile_for_args(self, *args, **kws)
    420         msg = (f"{str(e).rstrip()} \n\nThis error may have been caused "
    421                f"by the following argument(s):\n{args_str}\n")
    422         e.patch_message(msg)
--> 424     error_rewrite(e, 'typing')
    425 except errors.UnsupportedError as e:
    426     # Something unsupported is present in the user code, add help info
    427     error_rewrite(e, 'unsupported_error')

File D:\ProgramData\environments\ds_test\lib\site-packages\numba\core\dispatcher.py:365, in _DispatcherBase._compile_for_args.<locals>.error_rewrite(e, issue_type)
    363     raise e
    364 else:
--> 365     raise e.with_traceback(None)

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
non-precise type pyobject
During: typing of argument at D:\ProgramData\environments\ds_test\lib\site-packages\datashader\resampling.py (494)

File "D:\ProgramData\environments\ds_test\lib\site-packages\datashader\resampling.py", line 494:
def _get_fill_value(fill_value, src, out):
    <source elided>

@ngjit
^

During: Pass nopython_type_inference 

This error may have been caused by the following argument(s):
- argument 0: Cannot determine Numba type of <class 'dask.array.core.Array'>

@holovizbot
Copy link

This pull request has been mentioned on HoloViz Discourse. There might be relevant details there:

https://discourse.holoviz.org/t/error-when-creating-plot-from-dask-xarray-using-latest-package-versions/8926/1

@hoxbro
Copy link
Member Author

hoxbro commented Sep 8, 2025

I also created another new environment with the latest packages and without your change. I ran the same code and I got the same @ngjit decorater error (that error message is the one I mistakenly copied in my earlier message). So this seems to be a separate issue? I will open up a separate ticket for this.

I can recreate the issue with uv and will investigate; I'm not entirely sure what is causing this problem. Can you open an issue, so we don't forget about it, as it is out of scope for this PR.

uv venv --python 3.13
uv pip install dask datashader

For your questions, dask should generally be smart enough to max out your memory. Not sure if the problem lies in datashader or dask, but what I'm doing so far in the PR is changing the chunk size to avoid the memory.

@codspeed-hq
Copy link

codspeed-hq bot commented Sep 8, 2025

CodSpeed Instrumentation Performance Report

Merging #1448 will improve performances by 11.6%

Comparing fix_dont_crash (3863e28) with main (f44670c)

Summary

⚡ 2 improvements
✅ 41 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
test_quadmesh_raster[256] 15.9 ms 14.3 ms +11.57%
test_dask_raster[8192] 3.9 s 3.5 s +11.6%

@hoxbro
Copy link
Member Author

hoxbro commented Sep 8, 2025

I can recreate the issue with uv and will investigate; I'm not entirely sure what is causing this problem. Can you open an issue, so we don't forget about it, as it is out of scope for this PR.

It is because you also need to install pyarrow. I will consider how to improve this information, as it is not obvious.

sh, sw = src.shape
height_fraction, width_fraction = sh / h, sw / w
# For downsampling, use smaller chunks to reduce memory usage
if chunksize is None and (w < sw or h < sh):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: We need to update the docstring to reflect this.

@hoxbro hoxbro marked this pull request as ready for review September 8, 2025 07:43
@hoxbro hoxbro requested a review from philippjfr September 8, 2025 09:23
@hoxbro hoxbro added this to the v0.19.0 milestone Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants