Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbalanced Load distribution on first run #247

Open
mfakaehler opened this issue Jan 5, 2023 · 0 comments
Open

Unbalanced Load distribution on first run #247

mfakaehler opened this issue Jan 5, 2023 · 0 comments

Comments

@mfakaehler
Copy link

Hello,

I'm experiencing an issue with the load distribution of parallel jobs executed with future_map. In particular, I observe that the first time I call future_map the workload is happening on 2-3 workers only, while on consecutive runs of the same call the workload is shared evenly across the workers.
I tried to narrow it down into a reprex:

library(future)
library(tictoc)

plan(multisession, workers = 10)

tic()
res <- purrr::map(
  .x = 1:1e6, .f = ~.x +1
)
toc()
# 1.92 sec not in paralle

tic()
furrr::future_map(
  .x = 1:1e6, .f = ~.x +1
)
toc()
# 3.462 sec on first run

microbenchmark::microbenchmark(
  {
    furrr::future_map(
      .x = 1:1e6, .f = ~.x +1
    )
  },
  times = 20
)
# 1.2 secs on average on consecutive runs

In my "real-world" applications, where there is also a considerable amount of data to be passed to the workers, this tends to be more extreme.

This might be related to this previous issue:
#3

I'm working on the following system:

R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS

Any thoughts appreciated.

Best,
Maximilian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant