Consider a slightly modified version of Fibonacci to test the performance between (lambda vs function) (with vs without) captures:
size_t fibFn(size_t n) {
  if (n <= 1) { return n; }
  return fibFn(n - 1) + fibFn(n - 2)+1+2;
//                                  ^~~~
//   This is modified so that I can | 
//        capture something outside |
//                    this function |
}
When I run this in Quickbench with Clang 10.0, I got a reasonable result:
fnNoCapture < lambdaNoCapture < fn << lambda
When I am just about to conclude that lambda with capture block is extremely slow, however, the result is almost completely inverted when I run this with GCC 10.1:
lambdaNoCapture > fnNoCapture >> fn > lambda
How is this possible? Is it because the two compilers implemented lambda in different ways?
EDIT: Even so, it makes no sense to me that lambda (with capture) can be so much faster than that without capture. The best case a compiler can optimize, in my pov, is to convert a lambda with capture to that without capture (e.g. by inlining variables), if possible.
