Skip to content
This repository was archived by the owner on May 3, 2024. It is now read-only.

Parallel processing in subsample.py#197

Open
SichongP wants to merge 17 commits into
Magdoll:masterfrom
SichongP:master
Open

Parallel processing in subsample.py#197
SichongP wants to merge 17 commits into
Magdoll:masterfrom
SichongP:master

Conversation

@SichongP
Copy link
Copy Markdown

This PR adds parallelization to subsampling as this script takes too long to run right now.

I tested new script with 10,000 total reads at 100 reads step size and 100 iterations:

With original script:

35.3 s ± 70.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

With parallel script (5 threads):

12.8 s ± 171 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The improvement should be more pronounced in real samples as multiprocessing overhead becomes negligible.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants