gh-130285: Fix handling of zero or empty counts in random.sample() #130291

rhettinger · 2025-02-19T02:32:45Z

First draft for discussion. Will add doc updates and misc/news entry in a bit.

Issue: random.sample raises "IndexError: pop from empty list" when both "population" and "counts" are empty #130285

Dominik1123 · 2025-02-19T19:19:49Z

Looks good. When updating the docs, one could also mention that negative counts may lead to invalid results. When considering the equivalence for repeated elements:

sample(['red', 'blue'], counts=[4, 2], k=5) is equivalent to sample(['red', 'red', 'red', 'red', 'blue', 'blue'], k=5)

it may be tempting to extrapolate this to negative counts based on the observation that repeating (a sequence) a negative number of times always gives an empty sequence:

>>> [1]*-1
[]
>>> list(it.repeat(1, -1))
[]
>>> list(range(-1))
[]

Though that's probably a very rare edge case when considering random.sample.

rhettinger · 2025-02-20T00:24:45Z

I'm having some misgivings about this. The docs only say, "Repeated elements can be specified one at a time or with the optional keyword-only counts parameter". That speaks to the case of one-or-more and makes no promises about a count total of zero.

If I understand your original application, a sample was chosen from a pool, the selections were removed from the pool, and the process was repeated. Presumably along the way k was being reduced as well to avoid a ValueError.

At first that seemed reasonable to me, but the loop would need a stopping condition. An empty pool or k==0seems like a reasonable way to do that.

Also if that was the application, even better approaches are possible with the current API. Shuffle the dataset and extract subgroups as needed. That samples without replacement until the pool is drained. Likewise, sample could be called just once and the subgroups extracted from the supersample. The docs speak directly to this use case, "The resulting list is in selection order so that all sub-slices will also be valid random samples."

So, I'm a little dubious that PR is needed at all, that sample('ab', counts=[0,0], k=0) or sample([], 0, counts=[]) is something we want to encourage or enable, or that the docs for counts implied anything beyond simplifying "repeated elements".

Dominik1123 · 2025-02-20T19:59:56Z

I'm not sure if this PR is the right place to discuss the details of my application, but basically it uses multiple pools and a stopping condition that excludes k==0:

while more_items_need_to_be_selected:
    pool = ...  # some logic to select the pool
    k = ...  # some logic to choose k; guarantees k > 0
    try:
        items = random.sample(pool, k, counts=[weights[x] for x in pool])
    except ValueError:  # the pool doesn't have enough items
        ...
    else:
        ...

The purpose of the try/except is to handle the situation len(pool) < k and I expected 0 == len(pool) < k simply to be a special case of this. Especially since this part of the documentation

If the sample size is larger than the population size, a ValueError is raised.

comes after the explanation of the parameter counts. So, I didn't expect that specifying counts would make a difference. Also, the IndexError is not documented and the fact that it is raised stems from an implementation detail. For those reasons it feels like a bug to me.

I didn't encounter the other case (yet), where all counts are zero, but it seems reasonable to me. If the counts are used to control the specific population from which samples are chosen, some of the members may reach a count of zero, i.e., effectively being removed from the population. This is also a test case. So, if the counts are used to control the population, the only way to get to an empty population is when all counts are set to zero. In my opinion, this should be equivalent to sample([], k).

I don't think that the docs for counts imply anything beyond simplifying "repeated elements", as you wrote. But what exactly is "repeating elements" in Python? Above I gave three examples and they work with zero and even with negative counts:

>>> [...]*0, [...]*-1
([], [])
>>> list(it.repeat(..., 0)), list(it.repeat(..., -1))
([], [])
>>> list(range(0)), list(range(-1))
([], [])

So, repeating an element zero times seems reasonable to me, as it implies the absence of that element (not only in Python). Negative counts are questionable, though.

Whether it's really needed is a different question, though. From a practical point of view? Probably not. Someone encountering either of the two errors will not have a too hard time figuring out what went wrong and adjusting their code accordingly. I changed my code to

if len(pool) < k:
    ...

which is even more explicit, so it can go without a comment. But I can't use try/except (I wouldn't want to rely on an undocumented IndexError here).

Is it needed for the sake of correctness? I would say, yes. As I explained above, both scenarios appear reasonable to me and the behavior in the first one even feels like a bug.

rhettinger · 2025-02-21T16:52:43Z

ISTM that an explicit k==0 stop condition is warranted when using sample in a loop that progressively reduces the population counts. Further, it seems that a much better design would be to shuffle the whole population and extract the subsamples as needed. Also, more is being read into the counts documentation than was intended. The phrase "repeated elements can be specified one at a time" meant one-or-more.

That said, I don't see any downside for supporting the more expansive reading as zero-or-more even though that can only succeed when k==0. So, I'll move this forward.

miss-islington-app · 2025-02-21T17:33:14Z

Thanks @rhettinger for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

…e() (pythongh-130291) (cherry picked from commit 286c517) Co-authored-by: Raymond Hettinger <[email protected]>

bedevere-app · 2025-02-21T17:33:27Z

GH-130416 is a backport of this pull request to the 3.13 branch.

…e() (pythongh-130291) (cherry picked from commit 286c517) Co-authored-by: Raymond Hettinger <[email protected]>

bedevere-app · 2025-02-21T17:33:33Z

GH-130417 is a backport of this pull request to the 3.12 branch.

…le() (gh-130291) (gh-130417)

…le() (gh-130291) (gh-130416)

bedevere-bot · 2025-02-21T18:00:40Z

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot AMD64 FreeBSD14 3.x has failed when building commit 286c517.

What do you need to do:

Don't panic.
Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
Go to the page of the buildbot that failed (https://buildbot.python.org/#/builders/1232/builds/4734) and take a look at the build logs.
Check if the failure is related to this commit (286c517) or if it is a false positive.
If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/#/builders/1232/builds/4734

Failed tests:

test_interpreters

Summary of the results of the build (if available):

==

Click to see traceback logs

Traceback (most recent call last):
  File "<frozen getpath>", line 483, in <module>
ValueError: embedded null byte
Warning -- Uncaught thread exception: InterpreterError
Exception in thread Thread-282 (run):
RuntimeError: error evaluating path


Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.opsec-fbsd14/build/Lib/threading.py", line 1054, in _bootstrap_inner
    self.run()
    ~~~~~~~~^^
  File "/home/buildbot/buildarea/3.x.opsec-fbsd14/build/Lib/threading.py", line 996, in run
    self._target(*self._args, **self._kwargs)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.opsec-fbsd14/build/Lib/test/test_interpreters/test_stress.py", line 30, in task
    interp = interpreters.create()
  File "/home/buildbot/buildarea/3.x.opsec-fbsd14/build/Lib/test/support/interpreters/__init__.py", line 76, in create
    id = _interpreters.create(reqrefs=True)
interpreters.InterpreterError: interpreter creation failed
k


Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.opsec-fbsd14/build/Lib/threading.py", line 1054, in _bootstrap_inner
    self.run()
    ~~~~~~~~^^
  File "/home/buildbot/buildarea/3.x.opsec-fbsd14/build/Lib/threading.py", line 996, in run
    self._target(*self._args, **self._kwargs)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.opsec-fbsd14/build/Lib/test/test_interpreters/test_stress.py", line 47, in run
    interp = interpreters.create()
  File "/home/buildbot/buildarea/3.x.opsec-fbsd14/build/Lib/test/support/interpreters/__init__.py", line 76, in create
    id = _interpreters.create(reqrefs=True)
interpreters.InterpreterError: interpreter creation failed
k


Traceback (most recent call last):
  File "<frozen getpath>", line 483, in <module>
ValueError: embedded null byte
Warning -- Uncaught thread exception: InterpreterError
Exception in thread Thread-216 (task):
RuntimeError: error evaluating path

rhettinger added 3 commits February 18, 2025 19:32

pythongh-130285: Support zero counts in random.sample()

ff2db57

Add test cases for zero counts or empty counts

85ade39

Fold try/except into a conditional expression

dfb8144

rhettinger self-assigned this Feb 19, 2025

bedevere-app bot mentioned this pull request Feb 19, 2025

random.sample raises "IndexError: pop from empty list" when both "population" and "counts" are empty #130285

Closed

Let the user focus on the source of the negative total

76749c5

Add blurb

27f6cdb

rhettinger marked this pull request as ready for review February 21, 2025 17:02

bedevere-app bot added the awaiting core review label Feb 21, 2025

Merge branch 'main' into sample_zero_counts

a2c4837

rhettinger added needs backport to 3.12 only security fixes needs backport to 3.13 bugs and security fixes and removed awaiting core review labels Feb 21, 2025

rhettinger merged commit 286c517 into python:main Feb 21, 2025
45 checks passed

rhettinger deleted the sample_zero_counts branch February 21, 2025 17:33

bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Feb 21, 2025

bedevere-app bot removed the needs backport to 3.12 only security fixes label Feb 21, 2025

rhettinger pushed a commit that referenced this pull request Feb 21, 2025

[3.12] gh-130285: Fix handling of zero or empty counts in random.samp…

8db2fa2

…le() (gh-130291) (gh-130417)

rhettinger pushed a commit that referenced this pull request Feb 21, 2025

[3.13] gh-130285: Fix handling of zero or empty counts in random.samp…

8ef8947

…le() (gh-130291) (gh-130416)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-130285: Fix handling of zero or empty counts in random.sample() #130291

gh-130285: Fix handling of zero or empty counts in random.sample() #130291

Uh oh!

rhettinger commented Feb 19, 2025 •

edited by bedevere-app bot

Loading

Uh oh!

Dominik1123 commented Feb 19, 2025

Uh oh!

rhettinger commented Feb 20, 2025

Uh oh!

Dominik1123 commented Feb 20, 2025 •

edited

Loading

Uh oh!

rhettinger commented Feb 21, 2025

Uh oh!

Uh oh!

miss-islington-app bot commented Feb 21, 2025

Uh oh!

bedevere-app bot commented Feb 21, 2025

Uh oh!

bedevere-app bot commented Feb 21, 2025

Uh oh!

bedevere-bot commented Feb 21, 2025

Uh oh!

Uh oh!

Uh oh!

gh-130285: Fix handling of zero or empty counts in random.sample() #130291

gh-130285: Fix handling of zero or empty counts in random.sample() #130291

Uh oh!

Conversation

rhettinger commented Feb 19, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dominik1123 commented Feb 19, 2025

Uh oh!

rhettinger commented Feb 20, 2025

Uh oh!

Dominik1123 commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhettinger commented Feb 21, 2025

Uh oh!

Uh oh!

miss-islington-app bot commented Feb 21, 2025

Uh oh!

bedevere-app bot commented Feb 21, 2025

Uh oh!

bedevere-app bot commented Feb 21, 2025

Uh oh!

bedevere-bot commented Feb 21, 2025

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Uh oh!

Uh oh!

rhettinger commented Feb 19, 2025 •

edited by bedevere-app bot

Loading

Dominik1123 commented Feb 20, 2025 •

edited

Loading