Optimize `every()` and related functions #1169

ErdaradunGaztea · 2025-02-06T15:41:51Z

This PR reimplements every(), some(), and none() in C. I followed the C implementation of map() (since I have no experience writing in C). The interface does not change at all, the behaviour should also remain intact.

The key change is I did not use as_predicate(), but as_mapper() instead. Now, the difference between these two functions is that the former performs a Bool check on its output. This was computationally expensive and even replacing this check with .Call() didn't help, since the code was switching between C and R contexts a lot. My final solution was to perform these checks in C, in the same code that performs the predicate-checking loop.

I had to replace the implementation of none() with a separate C solution, since negate() had a huge overhead. However, all three functions share almost all of their C implementations now.

Finally, the performance. They should be equal now (except that every() and friends still have their early return).

library(purrr)

x <- as.list(1:10000)

fn <- function(x) {
  vctrs::vec_is(x) || is.null(x)
}

# Three basic benchmarks
bench::mark(
  all(map_lgl(x, fn)),
  every(x, fn),
  min_time = 1
)
#> # A tibble: 2 × 6
#>   expression               min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>          <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 all(map_lgl(x, fn))   15.8ms   16.3ms      59.0    4.15MB     44.3
#> 2 every(x, fn)          15.4ms   16.2ms      59.5   12.16KB     46.5

bench::mark(
  any(map_lgl(x, vctrs::vec_is_list)),
  some(x, vctrs::vec_is_list),
  min_time = 1
)
#> # A tibble: 2 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                          <bch:> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 any(map_lgl(x, vctrs::vec_is_list)) 6.94ms  7.2ms      135.    43.7KB     29.6
#> 2 some(x, vctrs::vec_is_list)         6.79ms 7.09ms      138.     9.7KB     27.5

bench::mark(
  !any(map_lgl(x, is.null)),
  none(x, is.null),
  min_time = 1
)
#> # A tibble: 2 × 6
#>   expression                     min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 !any(map_lgl(x, is.null))   4.78ms   4.99ms      197.   228.5KB     33.2
#> 2 none(x, is.null)             4.8ms   4.96ms      194.     9.7KB     33.2

# `negate()` has a lot of overhead
bench::mark(
  all(map_lgl(x, negate(is.null))),
  every(x, negate(is.null)),
  min_time = 1
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression                            min  median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                       <bch:tm> <bch:t>     <dbl> <bch:byt>    <dbl>
#> 1 all(map_lgl(x, negate(is.null)))    112ms   117ms      8.55    3.26MB     32.3
#> 2 every(x, negate(is.null))           114ms   124ms      7.84    2.67MB     24.5

# An early stop example
bench::mark(
  any(map_lgl(x, is.integer)),
  some(x, is.integer),
  min_time = 1
)
#> # A tibble: 2 × 6
#>   expression                       min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 any(map_lgl(x, is.integer))   4.87ms   5.26ms      176.      42KB     31.1
#> 2 some(x, is.integer)           53.3µs   61.4µs    14756.        0B     16.2

^{Created on 2025-02-06 with reprex v2.1.1}

Turns out that negate() adds too much overhead with C implementation of every()

ErdaradunGaztea and others added 5 commits February 6, 2025 13:18

Initial implementation of quick every()

63005a8

Extract C checks to a separate .C file

7686630

Implement some() in C

038fac8

Implement none() in C

e89a9a2

Turns out that negate() adds too much overhead with C implementation of every()

Revert .gitignore modifications

50d29c9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `every()` and related functions #1169

Optimize `every()` and related functions #1169

ErdaradunGaztea commented Feb 6, 2025

Optimize every() and related functions #1169

Are you sure you want to change the base?

Optimize every() and related functions #1169

Conversation

ErdaradunGaztea commented Feb 6, 2025

Optimize `every()` and related functions #1169

Optimize `every()` and related functions #1169