-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve unnest()
performance
#1141
Improve unnest()
performance
#1141
Conversation
cols <- cols[map_lgl(unclass(data)[cols], is.data.frame)] | ||
unpack(data, any_of(cols), names_sep = names_sep, names_repair = names_repair) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If unpack()
was smart enough to "skip" any non df-cols (like how unchop "skips" non list-cols) then we wouldn't even need this filtering line! For example, this unpack()
call could return its input unchanged.
library(tidyr)
df <- tibble(x = 1L, y = 1)
# "skips" unchopping y
unchop(df, y)
#> # A tibble: 1 x 2
#> x y
#> <int> <dbl>
#> 1 1 1
# requires df cols
unpack(df, y)
#> Error: `y` must be a data frame column
That would make unnest()
truly an unchop()
+ unpack()
without any intermediate adjustments, which theoretically is kind of nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would also be great to easily unpack all data frame columns unpack(df, everything())
and in general less need to worry when using selection helpers
LGTM! Looking forward to the speed improvements in |
It is now a simple combination of `unchop()` + `unpack()` on the remaining df-cols.
No changes in revdeps! |
Part of #1127
Closes #1112
This removes the
as_df()
andas_empty_df()
helpers fromunnest()
. Their main purpose seemed to be ensuring that each element of the list-col was a data frame, so thatunchop()
would return a packed df-col thatunpack()
would unpack. This doesn't seem to be necessary. Instead, we just letunchop()
unchop the list-cols (it "skips" over any non-list-cols) and then we tellunpack()
the remaining df-cols to unpack.As mentioned in #1126 (comment), there are a few edge cases that this would break. But they seemed to be off-label usage, only worked by accident, and
unnest_legacy()
didn't work that way either, so I think we are okay. I'll run revdeps to be sure. I've also added a test of the new error behavior.Here is an example of unnesting a list column of integers (which is really just unchop):
Here is an example of unnesting a list column of 2 column tibbles (i.e. standard unnest usage):