Consider this example:
#include <utility>
// runtime dominated by argument passing
template <class T>
void foo(T t) {}
int main() {
int i(0);
foo<int>(i); // fast -- int is scalar type
foo<int&>(i); // slow -- lvalue reference overhead
foo<int&&>(std::move(i)); // ???
}
Is foo<int&&>(i) as fast as foo<int>(i), or does it involve pointer overhead like foo<int&>(i)?
EDIT: As suggested, running g++ -S gave me the same 51-line assembly file for foo<int>(i) and foo<int&>(i), but foo<int&&>(std::move(i)) resulted in 71 lines of assembly code (it looks like the difference came from std::move).
EDIT: Thanks to those who recommended g++ -S with different optimization levels -- using -O3 (and making foo noinline) I was able to get output which looks like xaxxon's solution.