I'm writing a function where there is a binary operation running at the deepest level. Writing this function in Rcpp gave me a huge performance improvement for a fixed binary operation (+ for example), but when I try to find ways to generalize it for user input functions, I lose a lot of performance. Here's an example:
First, I'll define the same simple binary operation in R and Rcpp:
library(Rcpp)
library(benchmark)
cppFunction(
  code = "double simple_cpp(double a, double b) {
    double out = cos(a) + sin(b);
    return(out);
  }")
simple_r <- function(a,b) cos(a) + sin(b)
Then, a function that takes a function as argument and performs this operation many times
    cppFunction(
      code = "NumericVector many_simple_cpp(NumericVector a, NumericVector b, Function simple) {
      NumericVector out(a.size());
      for(int i = 0; i<a.size(); i++){
        out[i] = Rcpp::as<double>(simple(a[i],b[i]));
      }
      return(out);
    }")
And third, a function that uses a pre-defined C++ function as the operator
cppFunction(
  code = "
  double operation(double a, double b){
    return(cos(a) + sin(b));
  }
  NumericVector full_native_cpp(NumericVector a, NumericVector b) {
    NumericVector out(a.size());
    for(int i = 0; i<a.size(); i++){
      out[i] = operation(a[i],b[i]);
    }
    return(out);
  }")
Some performance metrics:
test <- 1:10000
benchmark(rfun = many_simple_cpp(test,test,simple_r),
          cppfun = many_simple_cpp(test,test,simple_cpp),
          cppnative = full_native_cpp(test,test))
       test replications elapsed relative user.self sys.self
2    cppfun          100   15.95    159.5     15.93     0.02
3 cppnative          100    0.10      1.0      0.09     0.00
1      rfun          100   14.71    147.1     14.67     0.00 
My question is: The third example made it clear that I can generalize the type of operation without losing too much performance, but my two approaches were incorrect. I think the performance loss was caused by calling Rcpp::as<double>(simple(a[i],b[i])); multiple times. Is there a way I can let the user create the C++ function to pass as argument (or convert an R function to C++)?
Why passing simple_cpp and simple_r gave me similar results? I expected calls with simple_cpp to have better performance because it was a function defined with Rcpp. 
