Skip to content

FIX: handle dask ValueErrors in apply_ufunc (set allow_rechunk=True) #4392

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Sep 9, 2020

Conversation

kmuehlbauer
Copy link
Contributor

@kmuehlbauer kmuehlbauer commented Aug 31, 2020

follow-up on #4060

@kmuehlbauer
Copy link
Contributor Author

kmuehlbauer commented Aug 31, 2020

As laid out in #4372 the behaviour before #4060 was to run blockwise with automatically aligning non-core dimensions (permit rechunking). For multiple chunks in core dimensions a ValueError would have been thrown.

If I understand the comments of @dcherian and @shoyer in #4372 correctly, allow_rechunk should be set True in the first case but not in the second.

This PR just checks for the dask ValueError which is raised for non-core dimension chunk mismatch, issues a warning, that the behaviour will change in the future (not sure if this is intended, please recommend otherwise) and reruns apply_gufunc with allow_rechunk=True. For the second case the dask ValueError is raised.

@kmuehlbauer kmuehlbauer changed the title FIX: catch dask ValueErrors, warn and set allow_rechunk=True FIX: handle dask ValueErrors in apply_ufunc (set allow_rechunk=True) Aug 31, 2020
@kmuehlbauer
Copy link
Contributor Author

@dcherian I think, that this is almost good to go. Any rework on the warning message needed?

try:
res = gufunc(**dask_gufunc_kwargs)
except ValueError as exc:
if "with different chunksize present" in str(exc):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is checking for exact wording of an error raised by dask for chunked non-core dimensions. That seems not-very-dependable. Can we instead check for chunked core dimensions and dask_gufunc_kwargs["allow_rechunk"] is True and raise an error on our own? I think we could copy the code and error over from before your previous PR, and modify it slightly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now you see me totally confused 😀
If we set allow_rechunk=True it will be rechunked no matter if core or non-core dimension mismatch. If we then check for core dimension chunksize > 1 (as in the previous version) and raise, how should the users pass this without rechunking themselves?

I'll try to come up with a proposal which doesn't rely on raised dask errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could do check for core dimension chunksize > 1 (as in the previous version) AND not dask_gufunc_kwargs.get("allow_rechunk")

@kmuehlbauer
Copy link
Contributor Author

I think my latest changes according to @dcherian's comments are now backwards compatible. Short explanation:

  • allow_rechunk is None (default)
    • if core dimension chunks > 1 raise ValueError (to keep backward compatibility concerning core dimension chunking)
    • else sets allow_rechunk=True (to keep backward compatibility concerning chunking mismatch in non-core dimensions
  • allow_rechunk == True (user need to explicitoly set this in the dask_gufunc_kwargs)
    • just call apply_gufunc
  • allow_rechunk == False (user need to explicitoly set this in the dask_gufunc_kwargs)
    • just call apply_gufunc

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a small change to the error message.

Thanks @kmuehlbauer !

@dcherian dcherian merged commit bb4c7b4 into pydata:master Sep 9, 2020
@kmuehlbauer kmuehlbauer deleted the fix-4372 branch May 25, 2023 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Set allow_rechunk=True in apply_ufunc
2 participants