Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize every() and related functions #1169

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ErdaradunGaztea
Copy link

Fixes #1036

This PR reimplements every(), some(), and none() in C. I followed the C implementation of map() (since I have no experience writing in C). The interface does not change at all, the behaviour should also remain intact.

The key change is I did not use as_predicate(), but as_mapper() instead. Now, the difference between these two functions is that the former performs a Bool check on its output. This was computationally expensive and even replacing this check with .Call() didn't help, since the code was switching between C and R contexts a lot. My final solution was to perform these checks in C, in the same code that performs the predicate-checking loop.

I had to replace the implementation of none() with a separate C solution, since negate() had a huge overhead. However, all three functions share almost all of their C implementations now.

Finally, the performance. They should be equal now (except that every() and friends still have their early return).

library(purrr)

x <- as.list(1:10000)

fn <- function(x) {
  vctrs::vec_is(x) || is.null(x)
}

# Three basic benchmarks
bench::mark(
  all(map_lgl(x, fn)),
  every(x, fn),
  min_time = 1
)
#> # A tibble: 2 × 6
#>   expression               min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>          <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 all(map_lgl(x, fn))   15.8ms   16.3ms      59.0    4.15MB     44.3
#> 2 every(x, fn)          15.4ms   16.2ms      59.5   12.16KB     46.5

bench::mark(
  any(map_lgl(x, vctrs::vec_is_list)),
  some(x, vctrs::vec_is_list),
  min_time = 1
)
#> # A tibble: 2 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                          <bch:> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 any(map_lgl(x, vctrs::vec_is_list)) 6.94ms  7.2ms      135.    43.7KB     29.6
#> 2 some(x, vctrs::vec_is_list)         6.79ms 7.09ms      138.     9.7KB     27.5

bench::mark(
  !any(map_lgl(x, is.null)),
  none(x, is.null),
  min_time = 1
)
#> # A tibble: 2 × 6
#>   expression                     min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 !any(map_lgl(x, is.null))   4.78ms   4.99ms      197.   228.5KB     33.2
#> 2 none(x, is.null)             4.8ms   4.96ms      194.     9.7KB     33.2

# `negate()` has a lot of overhead
bench::mark(
  all(map_lgl(x, negate(is.null))),
  every(x, negate(is.null)),
  min_time = 1
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression                            min  median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                       <bch:tm> <bch:t>     <dbl> <bch:byt>    <dbl>
#> 1 all(map_lgl(x, negate(is.null)))    112ms   117ms      8.55    3.26MB     32.3
#> 2 every(x, negate(is.null))           114ms   124ms      7.84    2.67MB     24.5

# An early stop example
bench::mark(
  any(map_lgl(x, is.integer)),
  some(x, is.integer),
  min_time = 1
)
#> # A tibble: 2 × 6
#>   expression                       min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 any(map_lgl(x, is.integer))   4.87ms   5.26ms      176.      42KB     31.1
#> 2 some(x, is.integer)           53.3µs   61.4µs    14756.        0B     16.2

Created on 2025-02-06 with reprex v2.1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance of every(), some(), and none()
1 participant