Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unnest column of data.frames #1112

Closed
geotheory opened this issue Mar 24, 2021 · 4 comments · Fixed by #1141
Closed

unnest column of data.frames #1112

geotheory opened this issue Mar 24, 2021 · 4 comments · Fixed by #1141
Labels
df-col 👜 feature a feature request or enhancement rectangling 🗄️ converting deeply nested lists into tidy data frames

Comments

@geotheory
Copy link

geotheory commented Mar 24, 2021

unnest() does not work with data.frame columns.


unnest() works with list column of nested data.frames, but fails with a data.frame column for the same data. Given that this is a common data format (e.g. typical fromJSON() output) it feels like an unnecessary trap for the unaware.

The documentation does specify "list-column" in the title of the documentation, but otherwise this distinction of column-type isn't discussed. It feels to me that a column of nested data.frames is quite likely to be confused with a list column of nested data.frames.

require(jsonlite)
require(tidyr)
require(dplyr)

j = '{
  "observations": [{
    "id": "a",
    "data": {
      "count": 49,
      "max": 100
    }
  }, {
    "id": "b",
    "data": {
      "count": 93,
      "max": 120
    }
  }, {
    "id": "c",
    "data": {
      "count": 27,
      "max": 88
    }
  }]
}'

d = fromJSON(j)$observations %>% as_tibble()

glimpse(d)
#> Rows: 3
#> Columns: 2
#> $ id   <chr> "a", "b", "c"
#> $ data <df[,2]> <data.frame[3 x 2]>

d %>% unnest(data)
#> Error: Assigned data `map(data[[col]], as_df, col = col)` must be compatible with existing data.
#> x Existing data has 3 rows.
#> x Assigned data has 2 rows.
#> ℹ Only vectors of size 1 are recycled.

d %>% rowwise() %>% mutate(data = list(data)) %>% unnest(data)
#> # A tibble: 3 x 3
#>   id    count   max
#>   <chr> <int> <int>
#> 1 a        49   100
#> 2 b        93   120
#> 3 c        27    88
@DavisVaughan
Copy link
Member

The error happens here in unnest():

tidyr/R/nest.R

Line 341 in 2fd80d5

data[[col]] <- map(data[[col]], as_df, col = col)

That can probably be fixed, but the bigger issue is that eventually this will call unchop(), and that won't support data frames until we have vctrs::vec_slice2(), which I noted here:

tidyr/R/chop.R

Lines 122 to 124 in 2fd80d5

# list column elements of `NULL` are dropped. If `x` has any data frame columns,
# these will be improperly treated as lists until `vec_slice2()` is implemented,
# but this should be extremely rare.

We could try to pull in the POC implementation of vec_slice2() from here, it doesn't seem too complicated but I'm not sure if it is ready for prime time or not. r-lib/vctrs#1228

@hadley
Copy link
Member

hadley commented Aug 23, 2021

Maybe we could just make this an informative error? (saying to use unpack() instead?)

@hadley hadley added feature a feature request or enhancement rectangling 🗄️ converting deeply nested lists into tidy data frames labels Aug 23, 2021
@DavisVaughan
Copy link
Member

The combination of #1140 and #1141 will actually fix this automatically

@hadley
Copy link
Member

hadley commented Aug 23, 2021

Also related to #969

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
df-col 👜 feature a feature request or enhancement rectangling 🗄️ converting deeply nested lists into tidy data frames
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants