Understanding of nested mirai calls #220

DavZim · 2025-02-25T15:17:29Z

Hi Charlie,
thank you for the wonderful package.
I was wondering how the following code is supposed to work.
My use case is that I have an internal package that uses mirai_map() for parallelization and a shiny frontend which uses mirai() with ExtendedTask to not block the session.
I use daemons() only in the shiny server but I found that only the outer mirai call is parallelized, whereas the inner mirai_map is not running on the cluster.

A MWE looks like this:

mirai::daemons(0)
paste("Hi from PID", Sys.getpid())
#> [1] "Hi from PID 718660"

tictoc::tic()
rr <- mirai::mirai({
  x <- paste("Hi from PID", Sys.getpid())
  
  res <- mirai::mirai_map(seq(4), function(i) {
    Sys.sleep(3)
    paste("Hi from PID", Sys.getpid(), "and i =", i)
  })
  
  c(x, res[])
})

rr[] |> unlist() |> paste(collapse = "\n") |> cat()
#> Hi from PID 721311
#> Hi from PID 721364 and i = 1
#> Hi from PID 721364 and i = 2
#> Hi from PID 721364 and i = 3
#> Hi from PID 721364 and i = 4

tictoc::toc()
#> 12.2 sec elapsed

Using 4 cores, I would expect to see different PIDs for each mirai_map call, as well as a full runtime of 3 seconds (maybe 6 seconds as one thread is occupied by the outer mirai call.

mirai::daemons(4)
tictoc::tic()
paste("Hi from PID", Sys.getpid())
#> [1] "Hi from PID 718660"

rr <- mirai::mirai({
  x <- paste("Hi from PID", Sys.getpid())
  
  res <- mirai::mirai_map(seq(4), function(i) {
    Sys.sleep(3)
    paste("Hi from PID", Sys.getpid(), "and i =", i)
  })
  
  c(x, res[])
})

rr[] |> unlist() |> paste(collapse = "\n") |> cat()
#> Hi from PID 721668
#> Hi from PID 721876 and i = 1
#> Hi from PID 721876 and i = 2
#> Hi from PID 721876 and i = 3
#> Hi from PID 721876 and i = 4
tictoc::toc()
#> 12.2 sec elapsed

I see no speedup and that the mirai_map calls are all on the same PID.

The text was updated successfully, but these errors were encountered:

DavZim · 2025-02-25T15:35:59Z

Ah, I think I found it. I can use mirai::daemons() inside the outer mirai call to parallelize the inner map functions.

Is this maybe something that should be mentioned in the documentation or so?

mirai::daemons(0)
tictoc::tic()
paste("Hi from PID", Sys.getpid())
#> [1] "Hi from PID 718660"

rr <- mirai::mirai({
  mirai::daemons(4) # <============= This is new here
  x <- paste("Hi from PID", Sys.getpid())
  
  res <- mirai::mirai_map(seq(4), function(i) {
    Sys.sleep(3)
    paste("Hi from PID", Sys.getpid(), "and i =", i)
  })
  
  c(x, res[])
})

rr[] |> unlist() |> paste(collapse = "\n") |> cat()
#> Hi from PID 726213
#> Hi from PID 726313 and i = 1
#> Hi from PID 726315 and i = 2
#> Hi from PID 726317 and i = 3
#> Hi from PID 726320 and i = 4
tictoc::toc()
#> 3.682 sec elapsed

shikokuchuo · 2025-02-25T17:23:59Z

Yes, I think I'll add some documentation surrounding your use case.

You'd probably want to have 1 daemon at the top level, and then set up your [4] daemons from there.

So something like:

with(daemons(1), {
  everywhere(mirai::daemons(4, dispatcher = FALSE))
  shiny::runApp(app)
})

Then in your ExtendedTask:

mirai({
  mirai::mirai_map(...)[]
})

DavZim · 2025-02-25T17:32:24Z

Is there any way to reuse threads? Let's say I have daemons(2) on the outside and then in each call a daemons(4) call, but I want the same 4 threads and not 2x4 threads in total.

shikokuchuo · 2025-02-25T18:12:31Z

You can't use the same 4 second level daemons from 2 top level daemons. That's why I suggested 1 top level daemon.

If you have a convincing use case for this, do let me know as it might be a feature we'd look into down the line.

DavZim · 2025-02-25T18:40:21Z

Typically I want to limit the number of cores I potentially use, because I mostly work on a shared server and I don't want to block the cpus for my colleagues.

At the same time I might have these nested parallel logic (2 cores for shiny and then 4 additional cores for computations or so). So I want to limit myself to 4+2 cores in total and don't accidentally exhaust all resources (eg when having 4 at the base and then creating 6 in each it would quickly balloon to 24 in total).

shikokuchuo · 2025-02-26T13:19:09Z

I understand your desire to limit cores used in general, but can you help me understand why you'd need more than 1 (top level) daemon to handle Shiny ExtendedTasks in your case.

If you only have 4 cores dedicated to handling the map operations, then it shouldn't make a difference if you send them from 2 separate daemons or the same one?

DavZim · 2025-03-05T07:13:45Z

That would be outside of the Shiny example.
But let's say I have multiple files that I want to do something on. Eg I have 20 PDF files with varying number of pages, where I need to do some time consuming stuff on each page.
I want to process the files in parallel, eg use 2 cores on the files level and then process the pages in parallel as well, but in total I want to only use 8 extra cores for the pages. In other words, I would like to use 10 cores in total and have the payload be automagically distributed.

I could limit each file-level core to 4 cores to have something like the following

with(daemons(2), {
  everywhere(mirai::daemons(4, dispatcher = FALSE))
  res <- mirai::mirai_map(files, process_file_and_extract_pages_parallel)
  res[]
})

but if I have one file with 1 page and another with 100s, most of the cores would not be used and it would then be more efficient to drop the file parallelism and use all cores on the page level. (Note I don't know in advance how many files or pages each file will have)

If I could declare that I want to use 10 cores in total and then let mirai decide which core to take, it could use just one core for the first page and the remaining on the 100s of pages.

This would be the case if we have a "flat nested hierarchy", eg if subsequent mirai calls can use the same cluster it was launched on.

shikokuchuo · 2025-03-05T20:03:49Z

I guess the request is still for 2 or more processes to share the same daemon 'pool'. This has been on our radar for a while (e.g. #89), although it hasn't made it to the top of the list just yet.

What I can recommend in the meantime, is to see if you can flatten the hierarchy yourself. For example, rather than launching 2 daemons to run the pages, run them sequentially in the main process, but send the heavy lifting to the daemons to do async and don't wait for them until everything is dispatched.

shikokuchuo added the documentation Improvements or additions to documentation label Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding of nested mirai calls #220

Understanding of nested mirai calls #220

DavZim commented Feb 25, 2025 •

edited

Loading

DavZim commented Feb 25, 2025 •

edited

Loading

shikokuchuo commented Feb 25, 2025

DavZim commented Feb 25, 2025

shikokuchuo commented Feb 25, 2025

DavZim commented Feb 25, 2025

shikokuchuo commented Feb 26, 2025

DavZim commented Mar 5, 2025

shikokuchuo commented Mar 5, 2025

Understanding of nested mirai calls #220

Understanding of nested mirai calls #220

Comments

DavZim commented Feb 25, 2025 • edited Loading

DavZim commented Feb 25, 2025 • edited Loading

shikokuchuo commented Feb 25, 2025

DavZim commented Feb 25, 2025

shikokuchuo commented Feb 25, 2025

DavZim commented Feb 25, 2025

shikokuchuo commented Feb 26, 2025

DavZim commented Mar 5, 2025

shikokuchuo commented Mar 5, 2025

DavZim commented Feb 25, 2025 •

edited

Loading

DavZim commented Feb 25, 2025 •

edited

Loading