-
-
Notifications
You must be signed in to change notification settings - Fork 31.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-130285: Fix handling of zero or empty counts in random.sample() #130291
Conversation
Looks good. When updating the docs, one could also mention that negative counts may lead to invalid results. When considering the equivalence for repeated elements:
it may be tempting to extrapolate this to negative counts based on the observation that repeating (a sequence) a negative number of times always gives an empty sequence: >>> [1]*-1
[]
>>> list(it.repeat(1, -1))
[]
>>> list(range(-1))
[] Though that's probably a very rare edge case when considering |
I'm having some misgivings about this. The docs only say, "Repeated elements can be specified one at a time or with the optional keyword-only counts parameter". That speaks to the case of one-or-more and makes no promises about a count total of zero. If I understand your original application, a sample was chosen from a pool, the selections were removed from the pool, and the process was repeated. Presumably along the way k was being reduced as well to avoid a ValueError. At first that seemed reasonable to me, but the loop would need a stopping condition. An empty pool or Also if that was the application, even better approaches are possible with the current API. Shuffle the dataset and extract subgroups as needed. That samples without replacement until the pool is drained. Likewise, sample could be called just once and the subgroups extracted from the supersample. The docs speak directly to this use case, "The resulting list is in selection order so that all sub-slices will also be valid random samples." So, I'm a little dubious that PR is needed at all, that |
I'm not sure if this PR is the right place to discuss the details of my application, but basically it uses multiple pools and a stopping condition that excludes while more_items_need_to_be_selected:
pool = ... # some logic to select the pool
k = ... # some logic to choose k; guarantees k > 0
try:
items = random.sample(pool, k, counts=[weights[x] for x in pool])
except ValueError: # the pool doesn't have enough items
...
else:
... The purpose of the
comes after the explanation of the parameter I didn't encounter the other case (yet), where all I don't think that the docs for >>> [...]*0, [...]*-1
([], [])
>>> list(it.repeat(..., 0)), list(it.repeat(..., -1))
([], [])
>>> list(range(0)), list(range(-1))
([], []) So, repeating an element zero times seems reasonable to me, as it implies the absence of that element (not only in Python). Negative counts are questionable, though. Whether it's really needed is a different question, though. From a practical point of view? Probably not. Someone encountering either of the two errors will not have a too hard time figuring out what went wrong and adjusting their code accordingly. I changed my code to
which is even more explicit, so it can go without a comment. But I can't use Is it needed for the sake of correctness? I would say, yes. As I explained above, both scenarios appear reasonable to me and the behavior in the first one even feels like a bug. |
ISTM that an explicit That said, I don't see any downside for supporting the more expansive reading as zero-or-more even though that can only succeed when |
Thanks @rhettinger for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13. |
…e() (pythongh-130291) (cherry picked from commit 286c517) Co-authored-by: Raymond Hettinger <[email protected]>
GH-130416 is a backport of this pull request to the 3.13 branch. |
…e() (pythongh-130291) (cherry picked from commit 286c517) Co-authored-by: Raymond Hettinger <[email protected]>
GH-130417 is a backport of this pull request to the 3.12 branch. |
|
First draft for discussion. Will add doc updates and misc/news entry in a bit.