-
-
Notifications
You must be signed in to change notification settings - Fork 740
remove *NUM_THREADS env variables from default config #9081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 27 files ±0 27 suites ±0 11h 32m 11s ⏱️ + 15m 21s For more details on these failures, see this check. Results for commit 492099e. ± Comparison against base commit 801d0ed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As indicated in dask/dask#11966 (review), I agree that these default configuration can be remove, but Dask and Distributed documentation should clearly reflect this change (it could have impact for some users, especially in HPC centers).
So pointing to how to configure this in distributed doc is mandatory to me.
as you’ve figured out, that’s why the companion PR dask/dask#11966 is linked above. so in other words you’re saying “LGTM”? |
I'm not trying to block (not something I can do anyway), but I do want to mention: I'm a bit concerned about the impact here for anyone using Dask and Numpy together for non-trivial operations (if they don't notice this change and set the variable themselves, they could have either terrible performance, or process failures that are very difficult to understand). Preferring all or none does make make logical sense to me, but the tradeoff with this change isn't totally clear to me. (Context/disclosure on me: I work in engineering at Coiled, which supports a number of Dask customers. I expect this change would create some difficulty for us & our customers, but it's not just us that I'm worried about.) |
@flying-sheep sorry I wasn't clear enough, I mean that distributed documentation, so in this repository, should mention this change, and how a user can configure it back, I guess by adding by himself the correct
@dchudz I agree with you, hence my comment, I think a change like that could have some hard to debug impact on existing workloads. So I guess we should choose between all or none:
|
Thanks @guillaumeeb. I'll say I'm still a bit worried even if we properly document and warn about this change, explain how to configure it back if needed. But I know it's not my decision. |
documentation here doesn’t mention this behavior at all, so I changed it at the only location where it is documented instead. where would I add it? |
I feel like this is a really tough situation because it's a footgun if we do and a footgun if we dont. I wonder if there is somewhere more visible that we can highlight this? Could we raise a warning somwhere that makes sense? Could we show these values discretely in the How can we make this easier to debug? |
I think that’s for dask maintainers to decide. I just walked into the footgun and wanted to be helpful by doing a quick PR with the changes that a maintainer agreed to. Since this seems to be a bigger discussion now, I feel like the dask devs should come to a conclusion. I’ll happily do small, clearly stated change requests to my PRs, but I won’t change anything until there has been a decision. |
Given this change was originally introduced by Coiled I'm keen to hear @dchudz's thoughts. Also if other maintainers are around and have opinions I'd love to hear them @guillaumeeb @quasiben @jrbourbeau |
Looks like that's this PR (#5098), in case anyone else is looking for the context.
My gut is to leave it alone. It seems to me that the reasoning from that PR still applies. To me it seems that removing these would cause more user pain than it solves (although as @jacobtomlinson notes, I recognize there's user pain in both directions). I understand that would leave the question of whether to add more, and a kind of inconsistency if we don't. My view on whether to add more is less strong, but I don't really think it's necessary to be consistent by adding every similar env var. I'd be inclined to suggest only thinking about it in response to common user pain. |
To be fair, setting these variables for users (without novice users really knowing it) is also common practice on HPC systems. I don't think there really is a solution better than the other. However, I would at least make sure that this setting is also visible in distributed docs somewhere, currently it looks hard to find or understand and might cause problem to a (really) few proportions of expert users. I would also tend a bit to put all the other variables identified there in default configuration, especially if they have an effect when using Numpy. |
See #9076 (comment)
Closes #9075
pre-commit run --all-files
Companion PR to dask/dask#11966