Skip to content

Commit

Permalink
Rectangling function rewrite (#1200)
Browse files Browse the repository at this point in the history
* Implement `tidyr_chop2()`

Since we don't have `vec_chop2()` yet r-lib/vctrs#1226

* Add `list_of_ptype()` helper for consistent list-of ptype extraction

* Generalize `unchop_col_info()` into `list_init_empty()`

This will be useful for an enhanced `simplify_col()`

* Rename `tidyr_chop2()` to `tidyr_chop()`

* Reimplement rectangling functions

* NEWS bullet

* Move details about `ptype` and `transform` to the `transform` param docs

* Link to `list_of()` documentation

* Mention recycling of `unnest_longer()`

* Synchronize `values_to` and `indices_to` documentation

* Replace "can't" with "must not"

* Introduce `tidyr_temporary_new_list_of()`

Because vctr objects can't currently have `""` names

* Provide more input validation for `ptype` and `transform`

* Add two tests about `names_sep`, provided by @mgirlich

* Chop non-lists into lists with `vec_chop()`

This uses more explainable vctrs tooling for converting non-primary data types (i.e. non-lists) into lists. This also seems to produce the expected output in more scenarios.

Also inlined `tidyr_chop()` into `elt_to_wide()` since that is the only other place it was used. The fact that this removed a helper makes me optimistic that it is the right approach.

* Test that df-cols result in list-ofs when `simplify = FALSE`

* Apply unique name repair on names before applying `names_sep`

Applying it before `names_sep` rather than after means that `""` and `NA_character_` names get repaired early on before they are combined with the prefix and `names_sep`, which can make them mistakently look like "valid" names
  • Loading branch information
DavisVaughan authored Nov 15, 2021
1 parent 7dfe606 commit a99bfd9
Show file tree
Hide file tree
Showing 7 changed files with 1,735 additions and 314 deletions.
19 changes: 19 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,24 @@
# tidyr (development version)

* The rectangling tools, `hoist()`, `unnest_wider()`, and `unnest_longer()`,
have undergone a complete rewrite. This has fixed many edge case bugs, and
has added the following new features:

* `unnest_wider()` and `unnest_longer()` can now unnest multiple columns at
once (#740).

* The `indices_to` and `values_to` arguments to `unnest_longer()` now accept
a glue specification, which is useful when unnesting multiple columns.

* If a `ptype` is supplied, but that column can't be simplified, the result
will be a list-of column where each element has type `ptype` (#998).

* `unnest_wider()` has a new `strict` argument which controls whether or not
strict vctrs typing rules should be applied. It defaults to `FALSE` for
backwards compatibility, and because it is often more useful to be lax
when unnesting JSON, which doesn't always map one-to-one with R's types
(#1125).

* `any_of()` and `all_of()` from tidyselect are now re-exported (#1217).

* `unchop()` now respects `ptype` when unnesting a non-list column (#1211).
Expand Down
58 changes: 9 additions & 49 deletions R/chop.R
Original file line number Diff line number Diff line change
Expand Up @@ -189,9 +189,16 @@ df_unchop <- function(x, ..., ptype = list(), keep_empty = FALSE) {
next
}

info <- unchop_col_info(col, keep_empty)
# Always replace `NULL` elements with size 1 missing equivalent for recycling.
# These will be reset to `NULL` in `unchop_finalize()` if the
# entire row was missing and `keep_empty = FALSE`.
info <- list_init_empty(
x = col,
null = TRUE,
typed = keep_empty
)

x[[i]] <- info$col
x[[i]] <- info$x
x_sizes[[i]] <- info$sizes
x_nulls[[i]] <- info$null
}
Expand Down Expand Up @@ -259,53 +266,6 @@ df_unchop <- function(x, ..., ptype = list(), keep_empty = FALSE) {
out
}

unchop_col_info <- function(col, keep_empty) {
sizes <- list_sizes(col)
null <- vec_equal_na(col)

ptype <- attr(col, "ptype", exact = TRUE)

if (any(null)) {
# Always replace `NULL` elements with size 1 missing equivalent for recycling.
# These will be reset to `NULL` in `unchop_finalize()` if the
# entire row was missing and `keep_empty = FALSE`.

if (is_null(ptype)) {
replacement <- list(unspecified(1L))
} else {
replacement <- list(vec_init(ptype, n = 1L))
replacement <- new_list_of(replacement, ptype = ptype)
}

col <- vec_assign(col, null, replacement)
sizes[null] <- 1L
}

if (keep_empty) {
# Remember, `NULL` elements are already handled above, so `sizes == 0L`
# will now only happen with typed empty elements.
empty_typed <- sizes == 0L

if (any(empty_typed)) {
# Replace empty typed elements with their size 1 equivalent

if (is_null(ptype)) {
# `vec_init()` is slow, see r-lib/vctrs#1423, so use `vec_slice()` equivalent
replacement <- map(vec_slice(col, empty_typed), vec_slice, i = NA_integer_)
} else {
# For list-of, all size elements are the same type
replacement <- list(vec_init(ptype, n = 1L))
replacement <- new_list_of(replacement, ptype = ptype)
}

col <- vec_assign(col, empty_typed, replacement)
sizes[empty_typed] <- 1L
}
}

list(col = col, sizes = sizes, null = null)
}

unchop_sizes2 <- function(x, y) {
# Standard tidyverse recycling rules, just vectorized.

Expand Down
Loading

0 comments on commit a99bfd9

Please sign in to comment.