Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implementation of group_modify() for multidplyr_party_df #102

Closed
huisaddison opened this issue Nov 11, 2020 · 3 comments
Closed

implementation of group_modify() for multidplyr_party_df #102

huisaddison opened this issue Nov 11, 2020 · 3 comments

Comments

@huisaddison
Copy link

In a previous comment for issue #51, plans for implementation of group_modify() and group_map() were mentioned. However, the current version 0.0.0.9 does not appear to have it implemented.

(If this is an inappropriate venue to discuss planned feature development, please close this issue, though I would be interested to know where in the roadmap group_modify falls).

Reprex:

> library(tidyverse)
> library(multidplyr)
> 
> cluster = new_cluster(2)
> 
> df = tibble(x=rnorm(20),
+             y=rnorm(20),
+             k=rep(1:5, 4))
> 
> df %>% group_by(k) %>%
+   group_modify(~broom::tidy(lm(y ~ x, data=.x))) %>%
+   ungroup ()
# A tibble: 10 x 6
       k term        estimate std.error statistic p.value
   <int> <chr>          <dbl>     <dbl>     <dbl>   <dbl>
 1     1 (Intercept)   0.634     0.0982     6.45   0.0232
 2     1 x            -1.48      0.172     -8.59   0.0133
 3     2 (Intercept)   0.212     0.180      1.18   0.360 
 4     2 x            -0.584     0.234     -2.49   0.130 
 5     3 (Intercept)   0.0886    0.307      0.288  0.800 
 6     3 x             0.483     0.822      0.588  0.616 
 7     4 (Intercept)  -1.26      0.809     -1.56   0.258 
 8     4 x             1.28      0.652      1.97   0.188 
 9     5 (Intercept)  -0.128     0.465     -0.276  0.808 
10     5 x            -0.0355    0.324     -0.110  0.923 
> 
> df %>% group_by(k) %>%
+   partition(cluster) %>%
+   group_modify(~broom::tidy(lm(y ~ x, data=.x))) %>%
+   ungroup () %>%
+   collect()
Error in UseMethod("group_modify") : 
  no applicable method for 'group_modify' applied to an object of class "multidplyr_party_df"
@hadley
Copy link
Member

hadley commented Jan 24, 2021

It's not currently on the roadmap because I don't think group_modify() is needed any more (see https://www.tidyverse.org/blog/2020/03/dplyr-1-0-0-summarise/ for details)

@hadley hadley closed this as completed Jan 24, 2021
@rxhu
Copy link

rxhu commented Jun 2, 2022

@hadley, I disagree. goup_modify is still needed because the following is working with group_modify but not working with summarise no matter whether the tilde ~ is included or not:

iris %>% group_by(Species) %>% summarize(~broom::tidy(lm(Petal.Length ~ Sepal.Length, data = .x)))
Error: Problem with summarise() input ..1.
i ..1 = ~broom::tidy(lm(Petal.Length ~ Sepal.Length, data = .x)).
x ..1 must be a vector, not a formula object.
i The error occurred in group 1: Species = setosa.
Run rlang::last_error() to see where the error occurred.

@domq
Copy link

domq commented Mar 10, 2023

@rxhu, it works for me without the tilde (the first one that is), if you also remove data = .x:

library(dplyr)
iris %>% group_by(Species) %>% summarize(broom::tidy(lm(Petal.Length ~ Sepal.Length)))

... of course, the warning you get is not exactly encouraging:

Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.

I haven't been able to get reframe to work yet, like, at all — multidplyr or no. 🤷 Update: after doing my homework,

library(dplyr)
iris %>% reframe(.by = Species, broom::tidy(lm(Petal.Length ~ Sepal.Length)))

Getting back to the issue at hand, so to speak. Out of the group_modify(), summarise() and reframe() approaches, only the one with summarise() appears to currently be compatible with multidplyr (obsolescence warning notwithstanding):

library(dplyr)
library(multidplyr)
cluster <- new_cluster(2)
iris %>% group_by(Species) %>% partition(cluster) %>% summarize(broom::tidy(lm(Petal.Length ~ Sepal.Length)))

The other two approaches give off an error message like

Error in UseMethod("reframe") :
  no applicable method for 'reframe' applied to an object of class "multidplyr_party_df"

As far as the latter is concerned, I took the liberty to open #143 afresh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants