Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using furrr inside a function call makes it slower (ggplot objects) #270

Open
dorwon opened this issue Sep 26, 2024 · 0 comments
Open

Using furrr inside a function call makes it slower (ggplot objects) #270

dorwon opened this issue Sep 26, 2024 · 0 comments

Comments

@dorwon
Copy link

dorwon commented Sep 26, 2024

This has been a hard one to make a full reprex for, but I'm hoping someone could explain what is going on or a workaround.

Basically I'm working on code that saves a list of ggplot graphs quickly to disk, ones that are often ~2GB using future_iwalk.

I save each page to a temp directory, then compile the completed pages in the final location.

My issue that I've noticed in practice is that when the code is run inside the function it runs signifigantly slower opposed to when it is written out.

This is an example I've been working on but I can't get the issue to show

library(ggplot2)
library(furrr)
library(dplyr)

plan("multisession")

plot_graph <- function(dt) {
  
  dt |> ggplot(aes(x = x, y = y)) +
    geom_point()
}

save_graphs <- function(list_plots) {
  
  dir_out_tmp <- tempdir()
  filename_out <- "temp"
  
  furrr::future_iwalk(
    list_plots,
    ~withr::with_pdf(
      new = fs::path(dir_out_tmp, paste(filename_out, .y, sep = "-"), ext = "pdf"),
      width = 15,
      height = 8,
      code = plot(.x)
    )
  )
  
  files_temp <- fs::path(dir_out_tmp, paste(filename_out, names(list_plots), sep = "-"), ext = "pdf")
  stopifnot(all(fs::file_exists(files_temp)))
  
  qpdf::pdf_combine(
    input = files_temp,
    output = fs::path(dir_out_tmp, file_name_out, ext = "pdf")
  )
}

a <- 1e3
b <- 1e5

dt <- data.frame(
  id = rep(1:a, each = b),
  x = runif(a*b),
  y = runif(a*b)
)

list_plots <- dt |>
  split(f = "id") |>
  purrr::map(plot_graph)


# Run Outside of Function -------------------------------------------------

tictoc::tic()
dir_out_tmp <- tempdir()
filename_out <- "temp"

furrr::future_iwalk(
  list_plots,
  ~withr::with_pdf(
    new = fs::path(dir_out_tmp, paste(filename_out, .y, sep = "-"), ext = "pdf"),
    width = 15,
    height = 8,
    code = plot(.x)
  )
)

files_temp <- fs::path(dir_out_tmp, paste(filename_out, names(list_plots), sep = "-"), ext = "pdf")
stopifnot(all(fs::file_exists(files_temp)))

qpdf::pdf_combine(
  input = files_temp,
  output = fs::path(dir_out_tmp, filename_out, ext = "pdf")
)
tictoc::toc()

# Run in Function ---------------------------------------------------------

tictoc::tic()
save_graphs(list_plots)
tictoc::toc()

The issue does not seem to show up for simple graphs, but in practice for more complicated graphs code this process can take 8 minutes outside of the function but 4 hours when the function is called.

By more complicated graph I mean I notice that my other graph uses scale_fill_manual() and scale_shape_manual() and when I remove them and retest the difference between the in and out of function version decreases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant