Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aaply slow compared to apply #275

Open
jiho opened this issue Apr 12, 2016 · 3 comments
Open

aaply slow compared to apply #275

jiho opened this issue Apr 12, 2016 · 3 comments

Comments

@jiho
Copy link
Contributor

jiho commented Apr 12, 2016

Here is a simple example

m <- matrix(1, nrow=10000, ncol=100)
f <- function(x) { sum(x + x^2) }
system.time(apply(m, 1, f))
system.time(aaply(m, 1, f))
library("doParallel")
registerDoParallel(cores=4)
system.time(aaply(m, 1, f, .parallel=T))

On my machine (macbook pro 3Ghz core i7) the times are:

> system.time(apply(m, 1, f))
   user  system elapsed 
  0.035   0.004   0.039 
> system.time(aaply(m, 1, f))
   user  system elapsed 
  1.160   0.007   1.169 
> system.time(aaply(m, 1, f, .parallel=T))
   user  system elapsed 
  5.555   0.544   3.423 

I understand aaply spends some time splitting the data before feeding it to laply and then llply, and it seems that puts a big overhead on the computation. There may not be a way of solving it cleanly. I also understand there is an overhead to parallel computation but I am quite surprised to see that it is way worse than the serial execution in this simple case.

In that situation, would you be OK with just redefining aaply as

aaply <- function(.data, .margins, .fun, ...) {
  apply(.data, .margins, .fun, ...)
}

and then setting the proper attributes and warn about the absence of progress bar and other options (or make this a special case when none of the other options is selected)? I could have a go at this if considered appropriate.

The reason I am suggesting it is that teaching plyr to R new comers is much easier than trying to explain them the various apply, sapply, tapply etc. but the cost in performance here is so large (and noticeable because summarising data over a few hundred thousand lines is common now) that it actually requires to make an exception and that quickly becomes the beginning of the end ;-)

@jiho jiho changed the title apply slow compared to apply aaply slow compared to apply Apr 12, 2016
@jiho
Copy link
Contributor Author

jiho commented Apr 12, 2016

PS: Blame autocorrect for the initial, strange, title of the issue!

@krlmlr
Copy link

krlmlr commented Apr 12, 2016

The behavior of these functions can be very different if the called function returns a vector:

m <- array(1:6, dim = c(2,3))
apply(m, 2, identity)
plyr::aaply(m, 2, identity)

Even more so with > 2 dimensions.

@hurrialice
Copy link

I wonder if there is a way to accelerate array manipulation - I think aaply output is more predictable but apply is quicker?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants