Skip to content

executor.map hangs at seemly random iteration when iterable is large #114948

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tburke02 opened this issue Feb 3, 2024 · 7 comments
Closed

executor.map hangs at seemly random iteration when iterable is large #114948

tburke02 opened this issue Feb 3, 2024 · 7 comments
Labels
pending The issue will be closed if no feedback is provided performance Performance or resource usage stdlib Python modules in the Lib dir topic-multiprocessing

Comments

@tburke02
Copy link

tburke02 commented Feb 3, 2024

Bug report

Bug description:

Minimum working example of issue:

import numpy as np
import concurrent.futures

with concurrent.futures.ProcessPoolExecutor() as executor:
    executor.map(print, np.arange(1000000))

Each time this is run it seems to hang at a different iteration with no apparent CPU or memory usage. May be related to #74028

CPython versions tested on:

3.9

Operating systems tested on:

Linux

Linked PRs

@tburke02 tburke02 added the type-bug An unexpected behavior, bug, or error label Feb 3, 2024
@gaogaotiantian
Copy link
Member

Is this reproducible without numpy? np.arange is simple enough but it would be nice to rule out all the possible externel dependencies in the example.

@tburke02
Copy link
Author

tburke02 commented Feb 3, 2024

Is this reproducible without numpy? np.arange is simple enough but it would be nice to rule out all the possible externel dependencies in the example.

Yes, same behavior when using range.

@terryjreedy
Copy link
Member

Win10, main, Command Prompt, with range: I get multiple duplicate tracebacks.

  File "C:\Programs\Python313\Lib\multiprocessing\spawn.py", line 164, in get_preparation_data
    _check_not_importing_main()
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "C:\Programs\Python313\Lib\multiprocessing\spawn.py", line 140, in _check_not_importing_main
    raise RuntimeError('''
    ...<16 lines>...
    ''')
  File "C:\Programs\Python313\Lib\multiprocessing\spawn.py", line 140, in _check_not_importing_main
    raise RuntimeError('''
    ...<16 lines>...
    ''')
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                ...

With the needed (on Windows) clause added, it prints 0 to 12 on separate lines and hangs. ^C and I get 12 interlaced tracebacks with some of the following lines

  File "C:\Programs\Python313\Lib\multiprocessing\process.py", line 314, in _bootstrap
    self.run()
  File "C:\Programs\Python313\Lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Programs\Python313\Lib\concurrent\futures\process.py", line 251, in _process_worker
    call_item = call_queue.get(block=True)
    ~~~~~~~~~~~~~~^^^^^^^^^^^^
  File "C:\Programs\Python313\Lib\multiprocessing\queues.py", line 102, in get
    with self._rlock:
        res = self._recv_bytes()
  File "C:\Programs\Python313\Lib\multiprocessing\queues.py", line 103, in get
    res = self._recv_bytes()
   File "C:\Programs\Python313\Lib\multiprocessing\synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt 

@Jason-Y-Z
Copy link
Contributor

Thanks all for the discussion and feedback. I've started a PR which should hopefully address/help with this and a couple of related issues.

@gaogaotiantian
Copy link
Member

This does not hang infinitely, it just takes a long time. (~1min on my comupter).

image

This is the main process timeline for a "normal" map operation, where the input is range(100).

The MainThread is doing the map, which submit the work item one by one to a dict. In each submit, two thing happen:

  1. a wakeup signal is sent to self.thread_wakeup queue
  2. a WorkItem's id is added to self._work_ids queue

During the first call of submit, an executor manager thread is brought up (Thread-1 in the graph), which is in charge of actually distributing the tasks.

It will get the WorkItems from self._work_ids and put them in a process queue that has a size of MAX_WORKERS + 1 for the process workers. Because the queue has a maximum size, the manager thread will not be able to put all the tasks in it.

When the task queue is full, the manager waits for a broken or wakeup signal in wait_result_broken_or_wakeup, and this is where the issue happened.

In this function, it needs to read all the pending wakeup signals in self.thread_wakeup queue mentioned above. Remember, the submit will not be blocked, it's not lazily evaluated, so each task sends a wakeup signal. This function needs to process ALL of them to be able to send more WorkItems to the worker process.

Each poll & recv takes about 50 us on my computer:

image

So if you have 1e6 tasks like in the example, you'll need 50s to process all the wake up signals before you can send out more WorkItems to the process workers.

So it's not a real dead lock, it just takes a long time. However, I do agree that if we can do something about it, that would be nice. But it won't make this go any faster even with lazy evaluation - it will make this smoother, as each wakeup signal always need to take 50us and it needs to be processed anyway (but it can be parallelized so if the task itself takes more than that, it should be ignorable if it's not blocking the task distribution).

@gaogaotiantian gaogaotiantian added performance Performance or resource usage stdlib Python modules in the Lib dir topic-multiprocessing and removed type-bug An unexpected behavior, bug, or error labels Feb 4, 2024
@Jason-Y-Z
Copy link
Contributor

@gaogaotiantian Thanks for the profiling, please see #114975 for an attempt to improve the large iterator use case

@gaogaotiantian
Copy link
Member

Ah okay this is a false alarm. What you should do is to specify a chunksize to batchify you data.

The docs says:

For very long iterables, using a large value for chunksize can significantly improve performance compared to the default size of 1

executor.map(print, np.arange(1000000), chunksize=10000)

should just solve the issue.

@gaogaotiantian gaogaotiantian added the pending The issue will be closed if no feedback is provided label Feb 7, 2024
@tburke02 tburke02 closed this as completed Feb 8, 2024
@gaogaotiantian gaogaotiantian closed this as not planned Won't fix, can't repro, duplicate, stale Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending The issue will be closed if no feedback is provided performance Performance or resource usage stdlib Python modules in the Lib dir topic-multiprocessing
Projects
None yet
Development

No branches or pull requests

4 participants