`aaply` slow compared to `apply` #275

jiho · 2016-04-12T20:13:19Z

Here is a simple example

m <- matrix(1, nrow=10000, ncol=100)
f <- function(x) { sum(x + x^2) }
system.time(apply(m, 1, f))
system.time(aaply(m, 1, f))
library("doParallel")
registerDoParallel(cores=4)
system.time(aaply(m, 1, f, .parallel=T))

On my machine (macbook pro 3Ghz core i7) the times are:

> system.time(apply(m, 1, f))
   user  system elapsed 
  0.035   0.004   0.039 
> system.time(aaply(m, 1, f))
   user  system elapsed 
  1.160   0.007   1.169 
> system.time(aaply(m, 1, f, .parallel=T))
   user  system elapsed 
  5.555   0.544   3.423

I understand aaply spends some time splitting the data before feeding it to laply and then llply, and it seems that puts a big overhead on the computation. There may not be a way of solving it cleanly. I also understand there is an overhead to parallel computation but I am quite surprised to see that it is way worse than the serial execution in this simple case.

In that situation, would you be OK with just redefining aaply as

aaply <- function(.data, .margins, .fun, ...) {
  apply(.data, .margins, .fun, ...)
}

and then setting the proper attributes and warn about the absence of progress bar and other options (or make this a special case when none of the other options is selected)? I could have a go at this if considered appropriate.

The reason I am suggesting it is that teaching plyr to R new comers is much easier than trying to explain them the various apply, sapply, tapply etc. but the cost in performance here is so large (and noticeable because summarising data over a few hundred thousand lines is common now) that it actually requires to make an exception and that quickly becomes the beginning of the end ;-)

The text was updated successfully, but these errors were encountered:

jiho · 2016-04-12T20:16:21Z

PS: Blame autocorrect for the initial, strange, title of the issue!

krlmlr · 2016-04-12T20:30:34Z

The behavior of these functions can be very different if the called function returns a vector:

m <- array(1:6, dim = c(2,3))
apply(m, 2, identity)
plyr::aaply(m, 2, identity)

Even more so with > 2 dimensions.

hurrialice · 2019-07-20T19:08:14Z

I wonder if there is a way to accelerate array manipulation - I think aaply output is more predictable but apply is quicker?

jiho changed the title ~~apply slow compared to apply~~ aaply slow compared to apply Apr 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`aaply` slow compared to `apply` #275

`aaply` slow compared to `apply` #275

jiho commented Apr 12, 2016

jiho commented Apr 12, 2016

krlmlr commented Apr 12, 2016

hurrialice commented Jul 20, 2019

aaply slow compared to apply #275

aaply slow compared to apply #275

Comments

jiho commented Apr 12, 2016

jiho commented Apr 12, 2016

krlmlr commented Apr 12, 2016

hurrialice commented Jul 20, 2019

`aaply` slow compared to `apply` #275

`aaply` slow compared to `apply` #275