-
-
Notifications
You must be signed in to change notification settings - Fork 378
fix: Improve compute_chunksize for downsampled data
#1448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
compute_chunksize for compute_chunksize for downsample
compute_chunksize for downsamplecompute_chunksize for downsampled data
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1448 +/- ##
=======================================
Coverage 88.33% 88.34%
=======================================
Files 96 96
Lines 18901 18905 +4
=======================================
+ Hits 16696 16701 +5
+ Misses 2205 2204 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hi, I am the original poster in the Stack Overflow question you have linked. Thank you for looking into this. I installed Datashader from the branch you had created. I ran the same code but I am now getting a different error. See below. PS: Sorry, not sure how to properly format the error message here. Traceback (most recent call last)
Cell In[3], line 2
1 # create plot using Datashader
----> 2 tf.shade(ds.Canvas(plot_height=300, plot_width=300).raster(dask_xarray))
File D:\ProgramData\environments\test_ds\lib\site-packages\datashader\core.py:1155, in Canvas.raster(self, source, layer, upsample_method, downsample_method, nan_value, agg, interpolate, chunksize, max_mem)
1151 data = resample_2d_distributed(
1152 source_window, chunksize=chunksize, max_mem=max_mem,
1153 **kwargs)
1154 else:
-> 1155 data = resample_2d(source_window, **kwargs)
1156 layers = 1
1157 else:
File D:\ProgramData\environments\test_ds\lib\site-packages\datashader\resampling.py:347, in resample_2d(src, w, h, ds_method, us_method, fill_value, mode_rank, x_offset, y_offset, out)
344 if isinstance(src, np.ma.MaskedArray):
345 src = src.data
--> 347 resampled = _resample_2d(src, mask, use_mask, ds_method, us_method,
348 fill_value, mode_rank, x_offset, y_offset, out)
349 return _mask_or_not(resampled, src, fill_value)
File D:\ProgramData\environments\test_ds\lib\site-packages\datashader\resampling.py:499, in _resample_2d(src, mask, use_mask, ds_method, us_method, fill_value, mode_rank, x_offset, y_offset, out)
497 def _resample_2d(src, mask, use_mask, ds_method, us_method, fill_value,
498 mode_rank, x_offset, y_offset, out):
--> 499 src_w, src_h, out_w, out_h = _get_dimensions(src, out)
500 x0_off, x1_off = x_offset
501 y0_off, y1_off = y_offset
File D:\ProgramData\environments\test_ds\lib\site-packages\numba\core\dispatcher.py:424, in _DispatcherBase._compile_for_args(self, *args, **kws)
420 msg = (f"{str(e).rstrip()} \n\nThis error may have been caused "
421 f"by the following argument(s):\n{args_str}\n")
422 e.patch_message(msg)
--> 424 error_rewrite(e, 'typing')
425 except errors.UnsupportedError as e:
426 # Something unsupported is present in the user code, add help info
427 error_rewrite(e, 'unsupported_error')
File D:\ProgramData\environments\test_ds\lib\site-packages\numba\core\dispatcher.py:365, in _DispatcherBase._compile_for_args.<locals>.error_rewrite(e, issue_type)
363 raise e
364 else:
--> 365 raise e.with_traceback(None)
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
non-precise type pyobject
During: typing of argument at D:\ProgramData\environments\test_ds\lib\site-packages\datashader\resampling.py (488)
File "D:\ProgramData\environments\test_ds\lib\site-packages\datashader\resampling.py", line 488:
def _get_fill_value(fill_value, src, out):
<source elided>
@ngjit
^
During: Pass nopython_type_inference
This error may have been caused by the following argument(s):
- argument 0: Cannot determine Numba type of <class 'dask.array.core.Array'> |
|
Sorry, I copied the wrong error message. I had created a new environment where I installed the latest version of all the required packages. I then manually copied over your changes in the resamply.py script (for some reason I couldn't install the Datashader directly from your GitHub branch). When I ran the code I got the same @ngjit decorater error as above. The error message from that run is at the end of this comment. Anyway, I created another new environment and this time I installed Datashader 0.16.1 which is the version that I was using in my original environment. I then manually changed the resamplying.py script for your changes. I can now run the code without any errors. The RAM increases by about 8gb and it takes about 1-2 minutes to run. So happy to confirm that the issue is solved. Thanks for your help! I just also have some related general questions for my own understanding. Would be much appreciated if you had the time to answer them.
Many thanks! |
|
This pull request has been mentioned on HoloViz Discourse. There might be relevant details there: |
I can recreate the issue with uv and will investigate; I'm not entirely sure what is causing this problem. Can you open an issue, so we don't forget about it, as it is out of scope for this PR. For your questions, dask should generally be smart enough to max out your memory. Not sure if the problem lies in datashader or dask, but what I'm doing so far in the PR is changing the chunk size to avoid the memory. |
CodSpeed Instrumentation Performance ReportMerging #1448 will improve performances by 11.6%Comparing Summary
Benchmarks breakdown
|
It is because you also need to install pyarrow. I will consider how to improve this information, as it is not obvious. |
| sh, sw = src.shape | ||
| height_fraction, width_fraction = sh / h, sw / w | ||
| # For downsampling, use smaller chunks to reduce memory usage | ||
| if chunksize is None and (w < sw or h < sh): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: We need to update the docstring to reflect this.

Previously, the following code would fill out all my memory, and then freeze, ending up with the system either killing the process or I needed to do a manual shutdown.
I haven't done any profiling to see if this affects performance, but at least it doesn't crash my computer anymore. If it does, we can maybe move the functionality into
resample_2d_distributed.First reported here: https://stackoverflow.com/questions/79753007/high-ram-usage-when-using-datashader-with-dasked-xarray