Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid using multiple processes for downloads #375

Merged
merged 1 commit into from
Jun 3, 2022

Conversation

bemoody
Copy link
Collaborator

@bemoody bemoody commented May 17, 2022

multiprocessing is messy; it's a pretty nifty tool but requires you to write your entire application with multiprocessing in mind.

For simply downloading files in parallel, there's little reason to use processes rather than threads.

This should, though I haven't tested it, solve the issues mentioned in pull #330 and probably also issue #306. I can't guarantee that this will solve every problem, but it should be strictly better than what we have now.

@tompollard
Copy link
Member

@bemoody please could you fix the conflict?

The standard multiprocessing module is used to distribute a task to
multiple processes, which is useful when doing heavy computation due
to the limitations of CPython; however, making this work is dependent
on the ability to fork processes or else to kludgily emulate forking
on systems that don't support it.  In particular, it tends to cause
problems on Windows unless you are very scrupulous about how you write
your program.

Therefore, as a rule, the multiprocessing module shouldn't be used by
general-purpose libraries, and should only be invoked by application
programmers themselves (who are in a position to guarantee that
imports have no side effects, the main script uses 'if __name__ ==
"__main__"', etc.)

However, downloading a file isn't a CPU-bound task, it's an I/O-bound
task, and therefore for this purpose, parallel threads should work as
well or even better than parallel processes.  The
multiprocessing.dummy module provides the same API as the
multiprocessing module, but uses threads instead of processes, so it
should be safe to use in a general-purpose library.
@bemoody bemoody force-pushed the no-multiprocess-downloads branch from 7a3ffde to 5357f55 Compare June 3, 2022 18:20
Copy link
Member

@tompollard tompollard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@tompollard tompollard merged commit bb57311 into main Jun 3, 2022
@tompollard tompollard deleted the no-multiprocess-downloads branch June 3, 2022 18:26
@cx1111
Copy link
Member

cx1111 commented Jun 13, 2022

@lbugnon does this solve the windows downloading issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants