Skip to content

Releases: tidyverse/tidyr

tidyr 1.0.2

24 Jan 21:40
Compare
Choose a tag to compare
  • Minor fixes for dev versions of rlang, tidyselect, and tibble.

(Was supposed to be 1.0.1; accidentally released as 1.0.2)

tidyr 1.0.0

13 Sep 14:48
Compare
Choose a tag to compare

Breaking changes

See vignette("in-packages") for a detailed transition guide.

  • nest() and unnest() have new syntax. The majority of existing usage
    should be automatically translated to the new syntax with a warning.
    If that doesn't work, put this in your script to use the old versions
    until you can take a closer look and update your code:

    library(tidyr) 
    nest <- nest_legacy 
    unnest <- unnest_legacy 
  • nest() now preserves grouping, which has implications for downstream calls
    to group-aware functions, such as dplyr::mutate() and filter().

  • The first argument of nest() has changed from data to .data.

  • unnest() uses the emerging tidyverse standard
    to disambiguate unique names. Use names_repair = tidyr_legacy to
    request the previous approach.

  • unnest_()/nest_() and the lazyeval methods for unnest()/nest() are
    now defunct. They have been deprecated for some time, and, since the interface
    has changed, package authors will need to update to avoid deprecation
    warnings. I think one clean break should be less work for everyone.

    All other lazyeval functions have been formally deprecated, and will be
    made defunct in the next major release. (See lifecycle vignette for
    details on deprecation stages).

  • crossing() and nesting() now return 0-row outputs if any input is a
    length-0 vector. If you want to preserve the previous behaviour which
    silently dropped these inputs, you should convert empty vectors to NULL.
    (More discussion on this general pattern at
    tidyverse/design#24)

Pivoting

New pivot_longer() and pivot_wider() provide modern alternatives to spread() and gather(). They have been carefully redesigned to be easier to learn and remember, and include many new features. Learn more in vignette("pivot").

These functions resolve multiple existing issues with spread()/gather(). Both functions now handle mulitple value columns (#149/#150), support more vector types (#333), use tidyverse conventions for duplicated column names (#496, #478), and are symmetric (#453). pivot_longer() gracefully handles duplicated column names (#472), and can directly split column names into multiple variables. pivot_wider() can now aggregate (#474), select keys (#572), and has control over generated column names (#208).

To demonstrate how these functions work in practice, tidyr has gained several new datasets: relig_income, construction, billboard, us_rent_income, fish_encounters and world_bank_pop.

Finally, tidyr demos have been removed. They are dated, and have been superseded by vignette("pivot").

Rectangling

tidyr contains four new functions to support rectangling, turning a deeply nested list into a tidy tibble: unnest_longer(), unnest_wider(), unnest_auto(), and hoist(). They are documented in a new vignette: vignette("rectangle").

unnest_longer() and unnest_wider() make it easier to unnest list-columns of vectors into either rows or columns (#418). unnest_auto() automatically picks between _longer() and _wider() using heuristics based on the presence of common names.

New hoist() provides a convenient way of plucking components of a list-column out into their own top-level columns (#341). This is particularly useful when you are working with deeply nested JSON, because it provides a convenient shortcut for the mutate() + map() pattern:

df %>% hoist(metadata, name = "name") 

tidyr 0.8.3

02 Mar 14:40
Compare
Choose a tag to compare
  • crossing() preserves factor levels (#410), now works with list-columns
    (#446, @SamanthaToet). (These also help expand() which is built on top
    of crossing())

  • nest() is compatible with dplyr 0.8.0.

  • spread() works when the id variable has names (#525).

  • unnest() preserves column being unnested when input is zero-length (#483),
    using list_of() attribute to correctly restore columns, where possible.

  • unnest() will run with named and unnamed list-columns of same length
    (@hlendway, #460).

tidyr 0.8.2

29 Oct 14:18
Compare
Choose a tag to compare
  • separate() now accepts NA as a column name in the into argument to
    denote columns which are omitted from the result. (@markdly, #397).

  • Minor updates to ensure compatibility with dependencies.

tidyr 0.8.1

18 May 14:04
Compare
Choose a tag to compare
  • unnest() weakens test of "atomicity" to restore previous behaviour when
    unnesting factors and dates (#407).

tidyr 0.8.0

30 Jan 15:44
Compare
Choose a tag to compare

Breaking changes

  • There are no deliberate breaking changes in this release. However, a number
    of packages are failing with errors related to numbers of elements in columns,
    and row names. It is possible that these are accidental API changes or new
    bugs. If you see such an error in your package, I would sincerely appreciate
    a minimal reprex.

  • separate() now correctly uses -1 to refer to the far right position,
    instead of -2. If you depended on this behaviour, you'll need to switch
    on packageVersion("tidyr") > "0.7.2"

New features

  • Increased test coverage from 84% to 99%.

  • uncount() performs the inverse operation of dplyr::count() (#279)

Bug fixes and minor improvements

  • complete(data) now returns data rather than throwing an error (#390).
    complete() with zero-length completions returns original input (#331).

  • crossing() preserves NAs (#364).

  • expand() with empty input gives empty data frame instead of NULL (#331).

  • expand(), crossing(), and complete() now complete empty factors instead
    of dropping them (#270, #285)

  • extract() has a better error message if regex does not contain the
    expected number of groups (#313).

  • drop_na() no longer drops columns (@jennybryan, #245), and works with
    list-cols (#280). Equivalent of NA in a list column is any empty
    (length 0) data structure.

  • nest() is now faster, especially when a long data frame is collapsed into
    a nested data frame with few rows.

  • nest() on a zero-row data frame works as expected (#320).

  • replace_na() no longer complains if you try and replace missing values in
    variables not present in the data (#356).

  • replace_na() now also works with vectors (#342, @flying-sheep), and
    can replace NULL in list-columns. It throws a better error message if
    you attempt to replace with something other than length 1.

  • separate() now longer checks that ... is empty, allowing methods to make
    use of it. This check was added in tidyr 0.4.0 (2016-02-02) to deprecate
    previous behaviour where ... was passed to strsplit().

  • separate() and extract() now insert columns in correct position when
    drop = TRUE (#394).

  • separate() now works correctly counts from RHS when using negative
    integer sep values (@markdly, #315).

  • separate() gets improved warning message when pieces aren't as expected
    (#375).

  • separate_rows() supports list columns (#321), and works with empty tibbles.

  • spread() now consistently returns 0 row outputs for 0 row inputs (#269).

  • spread() now works when key column includes NA and drop is FALSE
    (#254).

  • spread() no longer returns tibbles with row names (#322).

  • spread(), separate(), extract() (#255), and gather() (#347) now
    replace existing variables rather than creating an invalid data frame with
    duplicated variable names (matching the semantics of mutate).

  • unite() now works (as documented) if you don't supply any variables (#355).

  • unnest() gains preserve argument which allows you to preserve list
    columns without unnesting them (#328).

  • unnest() can unnested list-columns contains lists of lists (#278).

  • unnest(df) now works if df contains no list-cols (#344)

tidyr 0.7.2

17 Oct 14:09
Compare
Choose a tag to compare
  • The SE variants gather_(), spread_() and nest_() now
    treat non-syntactic names in the same way as pre tidy eval versions
    of tidyr (#361).

  • Fix tidyr bug revealed by R-devel.

tidyr 0.7.1

12 Sep 06:24
Compare
Choose a tag to compare

This is a hotfix release to account for some tidyselect changes in the
unit tests.

Note that the upcoming version of tidyselect backtracks on some of the
changes announced for 0.7.0. The special evaluation semantics for
selection have been changed back to the old behaviour because the new
rules were causing too much trouble and confusion. From now on data
expressions (symbols and calls to : and c()) can refer to both
registered variables and to objects from the context.

However the semantics for context expressions (any calls other than to
: and c()) remain the same. Those expressions are evaluated in the
context only and cannot refer to registered variables. If you're
writing functions and refer to contextual objects, it is still a good
idea to avoid data expressions by following the advice of the 0.7.0
release notes.

tidyr 0.7.0

16 Aug 14:10
Compare
Choose a tag to compare

This release includes important changes to tidyr internals. Tidyr now
supports the new tidy evaluation framework for quoting (NSE)
functions. It also uses the new tidyselect package as selecting
backend.

Breaking changes

  • If you see error messages about objects or functions not found, it
    is likely because the selecting functions are now stricter in their
    arguments An example of selecting function is gather() and its
    ... argument. This change makes the code more robust by
    disallowing ambiguous scoping. Consider the following code:

    x <- 3
    df <- tibble(w = 1, x = 2, y = 3)
    gather(df, "variable", "value", 1:x)
    

    Does it select the first three columns (using the x defined in the
    global environment), or does it select the first two columns (using
    the column named x)?

    To solve this ambiguity, we now make a strict distinction between
    data and context expressions. A data expression is either a bare
    name or an expression like x:y or c(x, y). In a data expression,
    you can only refer to columns from the data frame. Everything else
    is a context expression in which you can only refer to objects that
    you have defined with <-.

    In practice this means that you can no longer refer to contextual
    objects like this:

    mtcars %>% gather(var, value, 1:ncol(mtcars))
    
    x <- 3
    mtcars %>% gather(var, value, 1:x)
    mtcars %>% gather(var, value, -(1:x))
    

    You now have to be explicit about where to find objects. To do so,
    you can use the quasiquotation operator !! which will evaluate its
    argument early and inline the result:

    mtcars %>% gather(var, value, !! 1:ncol(mtcars))
    mtcars %>% gather(var, value, !! 1:x)
    mtcars %>% gather(var, value, !! -(1:x))
    

    An alternative is to turn your data expression into a context
    expression by using seq() or seq_len() instead of :. See the
    section on tidyselect for more information about these semantics.

  • Following the switch to tidy evaluation, you might see warnings
    about the "variable context not set". This is most likely caused by
    supplyng helpers like everything() to underscored versions of
    tidyr verbs. Helpers should be always be evaluated lazily. To fix
    this, just quote the helper with a formula: drop_na(df, ~everything()).

  • The selecting functions are now stricter when you supply integer
    positions. If you see an error along the lines of

    `-0.949999999999999`, `-0.940000000000001`, ... must resolve to
    integer column positions, not a double vector
    

    please round the positions before supplying them to tidyr. Double
    vectors are fine as long as they are rounded.

Switch to tidy evaluation

tidyr is now a tidy evaluation grammar. See the
programming vignette
in dplyr for practical information about tidy evaluation.

The tidyr port is a bit special. While the philosophy of tidy
evaluation is that R code should refer to real objects (from the data
frame or from the context), we had to make some exceptions to this
rule for tidyr. The reason is that several functions accept bare
symbols to specify the names of new columns to create (gather()
being a prime example). This is not tidy because the symbol do not
represent any actual object. Our workaround is to capture these
arguments using rlang::quo_name() (so they still support
quasiquotation and you can unquote symbols or strings). This type of
NSE is now discouraged in the tidyverse: symbols in R code should
represent real objects.

Following the switch to tidy eval the underscored variants are softly
deprecated. However they will remain around for some time and without
warning for backward compatibility.

Switch to the tidyselect backend

The selecting backend of dplyr has been extracted in a standalone
package tidyselect which tidyr now uses for selecting variables. It is
used for selecting multiple variables (in drop_na()) as well as
single variables (the col argument of extract() and separate(),
and the key and value arguments of spread()). This implies the
following changes:

  • The arguments for selecting a single variable now support all
    features from dplyr::pull(). You can supply a name or a position,
    including negative positions.

  • Multiple variables are now selected a bit differently. We now make a
    strict distinction between data and context expressions. A data
    expression is either a bare name of an expression like x:y or
    c(x, y). In a data expression, you can only refer to columns from
    the data frame. Everything else is a context expression in which you
    can only refer to objects that you have defined with <-.

    You can still refer to contextual objects in a data expression by
    being explicit. One way of being explicit is to unquote a variable
    from the environment with the tidy eval operator !!:

    x <- 2
    drop_na(df, 2)     # Works fine
    drop_na(df, x)     # Object 'x' not found
    drop_na(df, !! x)  # Works as if you had supplied 2

    On the other hand, select helpers like start_with() are context
    expressions. It is therefore easy to refer to objects and they will
    never be ambiguous with data columns:

    x <- "d"
    drop_na(df, starts_with(x))
    

    While these special rules is in contrast to most dplyr and tidyr
    verbs (where both the data and the context are in scope) they make
    sense for selecting functions and should provide more robust and
    helpful semantics.

tidyr 0.6.3

19 May 14:26
Compare
Choose a tag to compare
  • Patch tests to be compatible with dev tibble