You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On my machine (macbook pro 3Ghz core i7) the times are:
> system.time(apply(m, 1, f))
user system elapsed
0.035 0.004 0.039
> system.time(aaply(m, 1, f))
user system elapsed
1.160 0.007 1.169
> system.time(aaply(m, 1, f, .parallel=T))
user system elapsed
5.555 0.544 3.423
I understand aaply spends some time splitting the data before feeding it to laply and then llply, and it seems that puts a big overhead on the computation. There may not be a way of solving it cleanly. I also understand there is an overhead to parallel computation but I am quite surprised to see that it is way worse than the serial execution in this simple case.
In that situation, would you be OK with just redefining aaply as
and then setting the proper attributes and warn about the absence of progress bar and other options (or make this a special case when none of the other options is selected)? I could have a go at this if considered appropriate.
The reason I am suggesting it is that teaching plyr to R new comers is much easier than trying to explain them the various apply, sapply, tapply etc. but the cost in performance here is so large (and noticeable because summarising data over a few hundred thousand lines is common now) that it actually requires to make an exception and that quickly becomes the beginning of the end ;-)
The text was updated successfully, but these errors were encountered:
jiho
changed the title
apply slow compared to applyaaply slow compared to applyApr 12, 2016
Here is a simple example
On my machine (macbook pro 3Ghz core i7) the times are:
I understand
aaply
spends some time splitting the data before feeding it tolaply
and thenllply
, and it seems that puts a big overhead on the computation. There may not be a way of solving it cleanly. I also understand there is an overhead to parallel computation but I am quite surprised to see that it is way worse than the serial execution in this simple case.In that situation, would you be OK with just redefining
aaply
asand then setting the proper attributes and warn about the absence of progress bar and other options (or make this a special case when none of the other options is selected)? I could have a go at this if considered appropriate.
The reason I am suggesting it is that teaching
plyr
to R new comers is much easier than trying to explain them the variousapply
,sapply
,tapply
etc. but the cost in performance here is so large (and noticeable because summarising data over a few hundred thousand lines is common now) that it actually requires to make an exception and that quickly becomes the beginning of the end ;-)The text was updated successfully, but these errors were encountered: