I was doing some micro-optimization today for a related problem: checking if a numeric vector is empty (e.g. equivalent to numeric(0)) when it can either be empty or have a value (it is never NA or NULL). In my problem the check occurs hundreds of millions of time, so it seemed reasonable to benchmark the right approach. Length benchmarks quite a bit better than other options:
vec = numeric(0)
bench::mark(
x = { !length(vec) },
y = { rlang::is_empty(vec) },
z = { identical(vec, numeric(0)) },
check = FALSE,
min_time = 5,
min_iterations = 100000,
max_iterations = 100000
)
# A tibble: 3 x 6
expression min median `itr/sec` `gc/sec` n_gc
<bch:expr> <bch:tm> <bch:tm> <dbl> <dbl> <dbl>
1 x 200ns 300ns 3621037. 0 0
2 y 5.2us 5.8us 166594. 8.33 5
3 z 1.3us 1.5us 618090. 12.4 2
Length checking beating identical checking by 6x and by is_empty by 4x over that. The results for the case where the vector is non-empty are similar, so irrespective of the distribution of your data, just use length.
I am cognizant that there are probably edge cases where the behaviours of these three functions aren't identical, but if like me it's just a matter of a value being either c(some, number) or numeric(0) and you want to quickly check which, use length.