Skip to content

Commit a99bfd9

Browse files
authored
Rectangling function rewrite (#1200)
* Implement `tidyr_chop2()` Since we don't have `vec_chop2()` yet r-lib/vctrs#1226 * Add `list_of_ptype()` helper for consistent list-of ptype extraction * Generalize `unchop_col_info()` into `list_init_empty()` This will be useful for an enhanced `simplify_col()` * Rename `tidyr_chop2()` to `tidyr_chop()` * Reimplement rectangling functions * NEWS bullet * Move details about `ptype` and `transform` to the `transform` param docs * Link to `list_of()` documentation * Mention recycling of `unnest_longer()` * Synchronize `values_to` and `indices_to` documentation * Replace "can't" with "must not" * Introduce `tidyr_temporary_new_list_of()` Because vctr objects can't currently have `""` names * Provide more input validation for `ptype` and `transform` * Add two tests about `names_sep`, provided by @mgirlich * Chop non-lists into lists with `vec_chop()` This uses more explainable vctrs tooling for converting non-primary data types (i.e. non-lists) into lists. This also seems to produce the expected output in more scenarios. Also inlined `tidyr_chop()` into `elt_to_wide()` since that is the only other place it was used. The fact that this removed a helper makes me optimistic that it is the right approach. * Test that df-cols result in list-ofs when `simplify = FALSE` * Apply unique name repair on names before applying `names_sep` Applying it before `names_sep` rather than after means that `""` and `NA_character_` names get repaired early on before they are combined with the prefix and `names_sep`, which can make them mistakently look like "valid" names
1 parent 7dfe606 commit a99bfd9

File tree

7 files changed

+1735
-314
lines changed

7 files changed

+1735
-314
lines changed

NEWS.md

+19
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,24 @@
11
# tidyr (development version)
22

3+
* The rectangling tools, `hoist()`, `unnest_wider()`, and `unnest_longer()`,
4+
have undergone a complete rewrite. This has fixed many edge case bugs, and
5+
has added the following new features:
6+
7+
* `unnest_wider()` and `unnest_longer()` can now unnest multiple columns at
8+
once (#740).
9+
10+
* The `indices_to` and `values_to` arguments to `unnest_longer()` now accept
11+
a glue specification, which is useful when unnesting multiple columns.
12+
13+
* If a `ptype` is supplied, but that column can't be simplified, the result
14+
will be a list-of column where each element has type `ptype` (#998).
15+
16+
* `unnest_wider()` has a new `strict` argument which controls whether or not
17+
strict vctrs typing rules should be applied. It defaults to `FALSE` for
18+
backwards compatibility, and because it is often more useful to be lax
19+
when unnesting JSON, which doesn't always map one-to-one with R's types
20+
(#1125).
21+
322
* `any_of()` and `all_of()` from tidyselect are now re-exported (#1217).
423

524
* `unchop()` now respects `ptype` when unnesting a non-list column (#1211).

R/chop.R

+9-49
Original file line numberDiff line numberDiff line change
@@ -189,9 +189,16 @@ df_unchop <- function(x, ..., ptype = list(), keep_empty = FALSE) {
189189
next
190190
}
191191

192-
info <- unchop_col_info(col, keep_empty)
192+
# Always replace `NULL` elements with size 1 missing equivalent for recycling.
193+
# These will be reset to `NULL` in `unchop_finalize()` if the
194+
# entire row was missing and `keep_empty = FALSE`.
195+
info <- list_init_empty(
196+
x = col,
197+
null = TRUE,
198+
typed = keep_empty
199+
)
193200

194-
x[[i]] <- info$col
201+
x[[i]] <- info$x
195202
x_sizes[[i]] <- info$sizes
196203
x_nulls[[i]] <- info$null
197204
}
@@ -259,53 +266,6 @@ df_unchop <- function(x, ..., ptype = list(), keep_empty = FALSE) {
259266
out
260267
}
261268

262-
unchop_col_info <- function(col, keep_empty) {
263-
sizes <- list_sizes(col)
264-
null <- vec_equal_na(col)
265-
266-
ptype <- attr(col, "ptype", exact = TRUE)
267-
268-
if (any(null)) {
269-
# Always replace `NULL` elements with size 1 missing equivalent for recycling.
270-
# These will be reset to `NULL` in `unchop_finalize()` if the
271-
# entire row was missing and `keep_empty = FALSE`.
272-
273-
if (is_null(ptype)) {
274-
replacement <- list(unspecified(1L))
275-
} else {
276-
replacement <- list(vec_init(ptype, n = 1L))
277-
replacement <- new_list_of(replacement, ptype = ptype)
278-
}
279-
280-
col <- vec_assign(col, null, replacement)
281-
sizes[null] <- 1L
282-
}
283-
284-
if (keep_empty) {
285-
# Remember, `NULL` elements are already handled above, so `sizes == 0L`
286-
# will now only happen with typed empty elements.
287-
empty_typed <- sizes == 0L
288-
289-
if (any(empty_typed)) {
290-
# Replace empty typed elements with their size 1 equivalent
291-
292-
if (is_null(ptype)) {
293-
# `vec_init()` is slow, see r-lib/vctrs#1423, so use `vec_slice()` equivalent
294-
replacement <- map(vec_slice(col, empty_typed), vec_slice, i = NA_integer_)
295-
} else {
296-
# For list-of, all size elements are the same type
297-
replacement <- list(vec_init(ptype, n = 1L))
298-
replacement <- new_list_of(replacement, ptype = ptype)
299-
}
300-
301-
col <- vec_assign(col, empty_typed, replacement)
302-
sizes[empty_typed] <- 1L
303-
}
304-
}
305-
306-
list(col = col, sizes = sizes, null = null)
307-
}
308-
309269
unchop_sizes2 <- function(x, y) {
310270
# Standard tidyverse recycling rules, just vectorized.
311271

0 commit comments

Comments
 (0)