-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Understanding of nested mirai calls #220
Comments
Ah, I think I found it. I can use Is this maybe something that should be mentioned in the documentation or so? mirai::daemons(0)
tictoc::tic()
paste("Hi from PID", Sys.getpid())
#> [1] "Hi from PID 718660"
rr <- mirai::mirai({
mirai::daemons(4) # <============= This is new here
x <- paste("Hi from PID", Sys.getpid())
res <- mirai::mirai_map(seq(4), function(i) {
Sys.sleep(3)
paste("Hi from PID", Sys.getpid(), "and i =", i)
})
c(x, res[])
})
rr[] |> unlist() |> paste(collapse = "\n") |> cat()
#> Hi from PID 726213
#> Hi from PID 726313 and i = 1
#> Hi from PID 726315 and i = 2
#> Hi from PID 726317 and i = 3
#> Hi from PID 726320 and i = 4
tictoc::toc()
#> 3.682 sec elapsed |
Yes, I think I'll add some documentation surrounding your use case. You'd probably want to have 1 daemon at the top level, and then set up your [4] daemons from there. So something like: with(daemons(1), {
everywhere(mirai::daemons(4, dispatcher = FALSE))
shiny::runApp(app)
}) Then in your mirai({
mirai::mirai_map(...)[]
}) |
Is there any way to reuse threads? Let's say I have daemons(2) on the outside and then in each call a daemons(4) call, but I want the same 4 threads and not 2x4 threads in total. |
You can't use the same 4 second level daemons from 2 top level daemons. That's why I suggested 1 top level daemon. If you have a convincing use case for this, do let me know as it might be a feature we'd look into down the line. |
Typically I want to limit the number of cores I potentially use, because I mostly work on a shared server and I don't want to block the cpus for my colleagues. At the same time I might have these nested parallel logic (2 cores for shiny and then 4 additional cores for computations or so). So I want to limit myself to 4+2 cores in total and don't accidentally exhaust all resources (eg when having 4 at the base and then creating 6 in each it would quickly balloon to 24 in total). |
I understand your desire to limit cores used in general, but can you help me understand why you'd need more than 1 (top level) daemon to handle Shiny If you only have 4 cores dedicated to handling the map operations, then it shouldn't make a difference if you send them from 2 separate daemons or the same one? |
That would be outside of the Shiny example. I could limit each file-level core to 4 cores to have something like the following with(daemons(2), {
everywhere(mirai::daemons(4, dispatcher = FALSE))
res <- mirai::mirai_map(files, process_file_and_extract_pages_parallel)
res[]
}) but if I have one file with 1 page and another with 100s, most of the cores would not be used and it would then be more efficient to drop the file parallelism and use all cores on the page level. (Note I don't know in advance how many files or pages each file will have) If I could declare that I want to use 10 cores in total and then let mirai decide which core to take, it could use just one core for the first page and the remaining on the 100s of pages. This would be the case if we have a "flat nested hierarchy", eg if subsequent mirai calls can use the same cluster it was launched on. |
I guess the request is still for 2 or more processes to share the same daemon 'pool'. This has been on our radar for a while (e.g. #89), although it hasn't made it to the top of the list just yet. What I can recommend in the meantime, is to see if you can flatten the hierarchy yourself. For example, rather than launching 2 daemons to run the pages, run them sequentially in the main process, but send the heavy lifting to the daemons to do async and don't wait for them until everything is dispatched. |
Hi Charlie,
thank you for the wonderful package.
I was wondering how the following code is supposed to work.
My use case is that I have an internal package that uses
mirai_map()
for parallelization and a shiny frontend which usesmirai()
with ExtendedTask to not block the session.I use
daemons()
only in the shiny server but I found that only the outermirai
call is parallelized, whereas the innermirai_map
is not running on the cluster.A MWE looks like this:
Using 4 cores, I would expect to see different PIDs for each mirai_map call, as well as a full runtime of 3 seconds (maybe 6 seconds as one thread is occupied by the outer mirai call.
I see no speedup and that the mirai_map calls are all on the same PID.
The text was updated successfully, but these errors were encountered: