-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement vec_map()
#1227
Implement vec_map()
#1227
Conversation
And avoid infinite vctrs recursion when manipulating proxies
Should use `vec_simplify()` instead
modify <- function(.x, .fn, ...) {
vec_map(.x, .fn, ..., .ptype = .x)
}
modify(1:3, plus_one)
#> [1] 2 3 4
modify(c(FALSE, FALSE), plus_one)
#> [1] TRUE TRUE
modify(c(FALSE, TRUE), plus_one)
#> Error: Can't convert from <integer> to <logical> due to loss of precision.
#> * Locations: 1 |
Like the (The more problematic coercion is towards the narrower type but since we require |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know when it has docs to review, and I'll think more about the interface.
@@ -50,7 +50,7 @@ new_partial_factor <- function(partial = factor(), learned = factor()) { | |||
vec_ptype_full.vctrs_partial_factor <- function(x, ...) { | |||
empty <- "" | |||
|
|||
levels <- map(x, levels) | |||
levels <- map(unclass(x), levels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid an infinite recursion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why did it work before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because map()
didn't use vctrs operations and genericity.
The purrr compat file has been updated to use the vec_map() to get some internal testing and to make it easy to import unit tests from purrr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops sorry, the infinite recursion was another problem. It doesn't make sense for ptype-full to be recursed into anyway.
The problem is that map()
now assigns names, and partial factors and data frames don't support that:
test-partial-factor.R:5: error: has ok print method
`names<-.vctrs_partial_factor()` not supported.
This is a flaw in the partial types but as usual I'm just working around these types when they cause problems for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer assign NULL
names to be a little more efficient. However this unclass()
change is still needed because partial types inherit from vctrs_sclr
which have an unsupported names<-
method but still allow names()
, so vec_map()
sees the internal field names. Making the latter unsupported causes a bunch of other issues. It wouldn't solve the problem at hand anyway because now we'd have a vector type for which names()
is an error, which is a big genericity flaw. I think we shouldn't worry about these types too much for now.
Do you like the genericity model with lists implemented here? This version of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall interface seems reasonable to me.
|
||
SEXP elt_out = PROTECT(r_eval_force(vec_map_call, env)); | ||
if (vec_size(elt_out) != 1) { | ||
r_abort("Mapped function must return a size 1 vector."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be useful to includde index in this error message.
|
||
expect_identical( | ||
vec_map(vctr, identity), | ||
vec_chop(vctr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to define the expected output directly rather than in terms of another function, because I don't have enough of vec_chop()
loaded in my head to have any idea what this does.
|
||
return("Used to work in purrr") | ||
|
||
out2 <- map(NULL, identity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this return now?
Generally this looks good, but I think that I have a fairly strong aversion to the list part of this interface. From slider, I learned that there is a meaningful difference between
library(slider)
# Returns a list, assigns each element into the list with SET_VECTOR_ELT()
slide(1:2, ~1)
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 1
# Each `.f` result must be castable to a list of size 1,
# Assigned into output generically with `vec_assign()`
# (so it would extend nicely to list subclasses)
slide_vec(1:2, ~list(1), .ptype = list())
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 1
slide_vec(1:2, ~list(1, 2), .ptype = list())
#> Error: In iteration 1, the result of `.f` had size 2, not 1.
slide_vec(1:2, ~1, .ptype = list())
#> Error: Can't convert <double> to <list>. I think that the current For purrr, this isn't a huge deal, because I would advocate for: # This is the default
# `.ptype = NULL` is identical to `map()`,
# this is NOT guessing the ptype
vec_map(.x, .fn, ..., .ptype = NULL)
# This is like `slide_vec(.x, .fn, ..., .ptype = list())`
# and follows the exact same code path as the current `atomic_map()`
vec_map(.x, .fn, ..., .ptype = list())
# Genericity can be achieved with the following also going through `atomic_map()`
vec_map(.x, .fn, ..., .ptype = lst_rcrd()) Two additional notes:
library(vctrs)
library(rlang)
#> Warning: package 'rlang' was built under R version 4.0.2
local_methods <- function(..., .frame = caller_env()) {
local_bindings(..., .env = global_env(), .frame = .frame)
}
local_methods_name_rcrd <- function(frame = caller_env()) {
local_methods(
.frame = frame,
vec_proxy.vctrs_name_rcrd = function(x, ...) data_frame(data = unclass(x), last_names = attr(x, "last_names")),
vec_restore.vctrs_name_rcrd = function(x, to, ...) new_name_rcrd(x$data, x$last_names),
vec_ptype2.vctrs_name_rcrd.vctrs_name_rcrd = function(x, y, ...) x,
vec_cast.vctrs_name_rcrd.vctrs_name_rcrd = function(x, to, ...) x,
vec_cast.list.vctrs_name_rcrd = function(x, to, ...) vec_data(x)
)
}
local_methods_name_rcrd()
#> Setting deferred event(s) on global environment.
#> * Execute (and clear) with `withr::deferred_run()`.
#> * Clear (without executing) with `withr::deferred_clear()`.
new_name_rcrd <- function(x, last_names) {
structure(x, last_names = last_names, class = c("vctrs_name_rcrd", "list"))
}
# Example:
new_name_rcrd(list(1, 2:5), c("Foo", "Bar"))
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 2 3 4 5
#>
#> attr(,"last_names")
#> [1] "Foo" "Bar"
#> attr(,"class")
#> [1] "vctrs_name_rcrd" "list"
x <- c("Vaughan", "Henry")
ptype <- new_name_rcrd(list(), character())
# Currently assigns into bare list, then casts to `ptype`,
# but you can't cast a list to vctrs_name_rcrd
vctrs:::vec_map(x, ~1, .ptype = ptype)
#> Error: Can't convert <list> to <vctrs_name_rcrd>.
# Alternatively, could cast each element to vctrs_name_rcrd and
# assign generically with `vec_assign()`.
# NOTE: You can't do this at all with the current impl.
# The following would give the same result:
# vctrs:::vec_map(x, ~new_name_rcrd(1, .x), .ptype = ptype)
new_name_rcrd(list(1, 1), x)
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 1
#>
#> attr(,"last_names")
#> [1] "Vaughan" "Henry"
#> attr(,"class")
#> [1] "vctrs_name_rcrd" "list" |
Sorry to have convinced you to use one approach in slider and then implement another approach for purrr... I now think a unique operation based on
I think it's clearer to just unbox the scalar rather than specify such a ptype. It doesn't seem like there's any practical advantage to an implementation based on |
I am curious whether this effort has been abandoned or still on the roadmap? I would be happy if I could retire |
Now implemented in purrr. |
Branched from #1226.
This implements
vec_map()
. Depending on the supplied prototype, it covers a range of functionality provided in purrr and more:.ptype = list()
implementsmap()
.ptype = integer()
or other base atomic types implementsmap_int()
etc.ptype
can also be set to any vctrs typeIt isn't possible to infer the prototype from the list results because this would have no advantage over piping into
vec_simplify()
at the end.The implementation follows two code paths depending on whether
.ptype
is a list or an atomic vector.When mapping to a list, we update the input in place in the mapping loop and coerce the complete list of outputs to the target ptype at the end. This way the coercion is vectorised.
When mapping to an atomic vector we initialise an output vector and then coerce each input before assignment.
This vctrs implementation is competitive with purrr and base:
The purrr compat file has been updated to use the
vec_map()
to get some internal testing and to make it easy to import unit tests from purrr.