Releases: tidyverse/tidyr
tidyr 1.0.2
- Minor fixes for dev versions of rlang, tidyselect, and tibble.
(Was supposed to be 1.0.1; accidentally released as 1.0.2)
tidyr 1.0.0
Breaking changes
See vignette("in-packages")
for a detailed transition guide.
-
nest()
andunnest()
have new syntax. The majority of existing usage
should be automatically translated to the new syntax with a warning.
If that doesn't work, put this in your script to use the old versions
until you can take a closer look and update your code:library(tidyr) nest <- nest_legacy unnest <- unnest_legacy
-
nest()
now preserves grouping, which has implications for downstream calls
to group-aware functions, such asdplyr::mutate()
andfilter()
. -
The first argument of
nest()
has changed fromdata
to.data
. -
unnest()
uses the emerging tidyverse standard
to disambiguate unique names. Usenames_repair = tidyr_legacy
to
request the previous approach. -
unnest_()
/nest_()
and the lazyeval methods forunnest()
/nest()
are
now defunct. They have been deprecated for some time, and, since the interface
has changed, package authors will need to update to avoid deprecation
warnings. I think one clean break should be less work for everyone.All other lazyeval functions have been formally deprecated, and will be
made defunct in the next major release. (See lifecycle vignette for
details on deprecation stages). -
crossing()
andnesting()
now return 0-row outputs if any input is a
length-0 vector. If you want to preserve the previous behaviour which
silently dropped these inputs, you should convert empty vectors toNULL
.
(More discussion on this general pattern at
tidyverse/design#24)
Pivoting
New pivot_longer()
and pivot_wider()
provide modern alternatives to spread()
and gather()
. They have been carefully redesigned to be easier to learn and remember, and include many new features. Learn more in vignette("pivot")
.
These functions resolve multiple existing issues with spread()
/gather()
. Both functions now handle mulitple value columns (#149/#150), support more vector types (#333), use tidyverse conventions for duplicated column names (#496, #478), and are symmetric (#453). pivot_longer()
gracefully handles duplicated column names (#472), and can directly split column names into multiple variables. pivot_wider()
can now aggregate (#474), select keys (#572), and has control over generated column names (#208).
To demonstrate how these functions work in practice, tidyr has gained several new datasets: relig_income
, construction
, billboard
, us_rent_income
, fish_encounters
and world_bank_pop
.
Finally, tidyr demos have been removed. They are dated, and have been superseded by vignette("pivot")
.
Rectangling
tidyr contains four new functions to support rectangling, turning a deeply nested list into a tidy tibble: unnest_longer()
, unnest_wider()
, unnest_auto()
, and hoist()
. They are documented in a new vignette: vignette("rectangle")
.
unnest_longer()
and unnest_wider()
make it easier to unnest list-columns of vectors into either rows or columns (#418). unnest_auto()
automatically picks between _longer()
and _wider()
using heuristics based on the presence of common names.
New hoist()
provides a convenient way of plucking components of a list-column out into their own top-level columns (#341). This is particularly useful when you are working with deeply nested JSON, because it provides a convenient shortcut for the mutate()
+ map()
pattern:
df %>% hoist(metadata, name = "name")
tidyr 0.8.3
-
crossing()
preserves factor levels (#410), now works with list-columns
(#446, @SamanthaToet). (These also helpexpand()
which is built on top
ofcrossing()
) -
nest()
is compatible with dplyr 0.8.0. -
spread()
works when the id variable has names (#525). -
unnest()
preserves column being unnested when input is zero-length (#483),
usinglist_of()
attribute to correctly restore columns, where possible. -
unnest()
will run with named and unnamed list-columns of same length
(@hlendway, #460).
tidyr 0.8.2
tidyr 0.8.1
unnest()
weakens test of "atomicity" to restore previous behaviour when
unnesting factors and dates (#407).
tidyr 0.8.0
Breaking changes
-
There are no deliberate breaking changes in this release. However, a number
of packages are failing with errors related to numbers of elements in columns,
and row names. It is possible that these are accidental API changes or new
bugs. If you see such an error in your package, I would sincerely appreciate
a minimal reprex. -
separate()
now correctly uses -1 to refer to the far right position,
instead of -2. If you depended on this behaviour, you'll need to switch
onpackageVersion("tidyr") > "0.7.2"
New features
-
Increased test coverage from 84% to 99%.
-
uncount()
performs the inverse operation ofdplyr::count()
(#279)
Bug fixes and minor improvements
-
complete(data)
now returnsdata
rather than throwing an error (#390).
complete()
with zero-length completions returns original input (#331). -
crossing()
preservesNA
s (#364). -
expand()
with empty input gives empty data frame instead ofNULL
(#331). -
expand()
,crossing()
, andcomplete()
now complete empty factors instead
of dropping them (#270, #285) -
extract()
has a better error message ifregex
does not contain the
expected number of groups (#313). -
drop_na()
no longer drops columns (@jennybryan, #245), and works with
list-cols (#280). Equivalent ofNA
in a list column is any empty
(length 0) data structure. -
nest()
is now faster, especially when a long data frame is collapsed into
a nested data frame with few rows. -
nest()
on a zero-row data frame works as expected (#320). -
replace_na()
no longer complains if you try and replace missing values in
variables not present in the data (#356). -
replace_na()
now also works with vectors (#342, @flying-sheep), and
can replaceNULL
in list-columns. It throws a better error message if
you attempt to replace with something other than length 1. -
separate()
now longer checks that...
is empty, allowing methods to make
use of it. This check was added in tidyr 0.4.0 (2016-02-02) to deprecate
previous behaviour where...
was passed tostrsplit()
. -
separate()
andextract()
now insert columns in correct position when
drop = TRUE
(#394). -
separate()
now works correctly counts from RHS when using negative
integersep
values (@markdly, #315). -
separate()
gets improved warning message when pieces aren't as expected
(#375). -
separate_rows()
supports list columns (#321), and works with empty tibbles. -
spread()
now consistently returns 0 row outputs for 0 row inputs (#269). -
spread()
now works whenkey
column includesNA
anddrop
isFALSE
(#254). -
spread()
no longer returns tibbles with row names (#322). -
spread()
,separate()
,extract()
(#255), andgather()
(#347) now
replace existing variables rather than creating an invalid data frame with
duplicated variable names (matching the semantics of mutate). -
unite()
now works (as documented) if you don't supply any variables (#355). -
unnest()
gainspreserve
argument which allows you to preserve list
columns without unnesting them (#328). -
unnest()
can unnested list-columns contains lists of lists (#278). -
unnest(df)
now works ifdf
contains no list-cols (#344)
tidyr 0.7.2
-
The SE variants
gather_()
,spread_()
andnest_()
now
treat non-syntactic names in the same way as pre tidy eval versions
of tidyr (#361). -
Fix tidyr bug revealed by R-devel.
tidyr 0.7.1
This is a hotfix release to account for some tidyselect changes in the
unit tests.
Note that the upcoming version of tidyselect backtracks on some of the
changes announced for 0.7.0. The special evaluation semantics for
selection have been changed back to the old behaviour because the new
rules were causing too much trouble and confusion. From now on data
expressions (symbols and calls to :
and c()
) can refer to both
registered variables and to objects from the context.
However the semantics for context expressions (any calls other than to
:
and c()
) remain the same. Those expressions are evaluated in the
context only and cannot refer to registered variables. If you're
writing functions and refer to contextual objects, it is still a good
idea to avoid data expressions by following the advice of the 0.7.0
release notes.
tidyr 0.7.0
This release includes important changes to tidyr internals. Tidyr now
supports the new tidy evaluation framework for quoting (NSE)
functions. It also uses the new tidyselect package as selecting
backend.
Breaking changes
-
If you see error messages about objects or functions not found, it
is likely because the selecting functions are now stricter in their
arguments An example of selecting function isgather()
and its
...
argument. This change makes the code more robust by
disallowing ambiguous scoping. Consider the following code:x <- 3 df <- tibble(w = 1, x = 2, y = 3) gather(df, "variable", "value", 1:x)
Does it select the first three columns (using the
x
defined in the
global environment), or does it select the first two columns (using
the column namedx
)?To solve this ambiguity, we now make a strict distinction between
data and context expressions. A data expression is either a bare
name or an expression likex:y
orc(x, y)
. In a data expression,
you can only refer to columns from the data frame. Everything else
is a context expression in which you can only refer to objects that
you have defined with<-
.In practice this means that you can no longer refer to contextual
objects like this:mtcars %>% gather(var, value, 1:ncol(mtcars)) x <- 3 mtcars %>% gather(var, value, 1:x) mtcars %>% gather(var, value, -(1:x))
You now have to be explicit about where to find objects. To do so,
you can use the quasiquotation operator!!
which will evaluate its
argument early and inline the result:mtcars %>% gather(var, value, !! 1:ncol(mtcars)) mtcars %>% gather(var, value, !! 1:x) mtcars %>% gather(var, value, !! -(1:x))
An alternative is to turn your data expression into a context
expression by usingseq()
orseq_len()
instead of:
. See the
section on tidyselect for more information about these semantics. -
Following the switch to tidy evaluation, you might see warnings
about the "variable context not set". This is most likely caused by
supplyng helpers likeeverything()
to underscored versions of
tidyr verbs. Helpers should be always be evaluated lazily. To fix
this, just quote the helper with a formula:drop_na(df, ~everything())
. -
The selecting functions are now stricter when you supply integer
positions. If you see an error along the lines of`-0.949999999999999`, `-0.940000000000001`, ... must resolve to integer column positions, not a double vector
please round the positions before supplying them to tidyr. Double
vectors are fine as long as they are rounded.
Switch to tidy evaluation
tidyr is now a tidy evaluation grammar. See the
programming vignette
in dplyr for practical information about tidy evaluation.
The tidyr port is a bit special. While the philosophy of tidy
evaluation is that R code should refer to real objects (from the data
frame or from the context), we had to make some exceptions to this
rule for tidyr. The reason is that several functions accept bare
symbols to specify the names of new columns to create (gather()
being a prime example). This is not tidy because the symbol do not
represent any actual object. Our workaround is to capture these
arguments using rlang::quo_name()
(so they still support
quasiquotation and you can unquote symbols or strings). This type of
NSE is now discouraged in the tidyverse: symbols in R code should
represent real objects.
Following the switch to tidy eval the underscored variants are softly
deprecated. However they will remain around for some time and without
warning for backward compatibility.
Switch to the tidyselect backend
The selecting backend of dplyr has been extracted in a standalone
package tidyselect which tidyr now uses for selecting variables. It is
used for selecting multiple variables (in drop_na()
) as well as
single variables (the col
argument of extract()
and separate()
,
and the key
and value
arguments of spread()
). This implies the
following changes:
-
The arguments for selecting a single variable now support all
features fromdplyr::pull()
. You can supply a name or a position,
including negative positions. -
Multiple variables are now selected a bit differently. We now make a
strict distinction between data and context expressions. A data
expression is either a bare name of an expression likex:y
or
c(x, y)
. In a data expression, you can only refer to columns from
the data frame. Everything else is a context expression in which you
can only refer to objects that you have defined with<-
.You can still refer to contextual objects in a data expression by
being explicit. One way of being explicit is to unquote a variable
from the environment with the tidy eval operator!!
:x <- 2 drop_na(df, 2) # Works fine drop_na(df, x) # Object 'x' not found drop_na(df, !! x) # Works as if you had supplied 2
On the other hand, select helpers like
start_with()
are context
expressions. It is therefore easy to refer to objects and they will
never be ambiguous with data columns:x <- "d" drop_na(df, starts_with(x))
While these special rules is in contrast to most dplyr and tidyr
verbs (where both the data and the context are in scope) they make
sense for selecting functions and should provide more robust and
helpful semantics.
tidyr 0.6.3
- Patch tests to be compatible with dev tibble