I was benchmarking if it is worth to put loop inside a callback function so I tested fourth order Runge-Kutta of on y'=y in C++, all with gcc 5.1 on Ubuntu with compilation command
g++ -std=c++11 -O3 -march=native --fast-math test.cpp
The Runge-Kutta loop
double dt=(t_end-t_0)/N;
auto y=y_0;
auto t=t_0;
for(size_t k=0;k<N;++k)
{
auto k_1=f(k*dt, y);
auto k_2=f(k*dt + 0.5*dt, y + 0.5*dt*k_1);
auto k_3=f(k*dt + 0.5*dt, y + 0.5*dt*k_2);
auto k_4=f(k*dt + dt, y + dt*k_3);
y+=dt*(k_1 + 2*k_2 + 2*k_3 + k_1)/6.0;
}
return y;
Inlining was achieved by a template and a function object. For dynamic binding a function pointer was used.
The Pentium M
Specs as given by /proc/cpuinfo
cpu family : 6
model : 13
model name : Intel(R) Pentium(R) M processor 1.73GHz
stepping : 8
microcode : 0x20
Frequency from sudo cpufreq-info
current policy: frequency should be within 800 MHz and 1.73 GHz.
The governor "userspace" may decide which speed to use
within this range.
current CPU frequency is 1.73 GHz (asserted by call to hardware).
Results
ODE solution exp(1) diff Execution time
Function pointer 2.718281828037378 2.718281828459045 -4.21667145644733e-10 53321972
Inlined call 2.718281828037378 2.718281828459045 -4.21667145644733e-10 19916460
The Prescott:
Specs as given by /proc/cpuinfo
cpu family : 15
model : 4
model name : Intel(R) Pentium(R) 4 CPU 3.40GHz
stepping : 3
microcode : 0x5
Frequency from sudo cpufreq-info
current policy: frequency should be within 2.80 GHz and 3.40 GHz.
The governor "userspace" may decide which speed to use
within this range.
current CPU frequency is 3.40 GHz.
Results
ODE solution exp(1) diff Execution time
Function pointer 2.718281828037378 2.718281828459045 -4.21667145644733e-10 70811683
Inlined call 2.718281828037378 2.718281828459045 -4.21667145644733e-10 19928642
Comparison and question
So the Prescott performs no better (it seems to be much worse), than the much lower clocked Pentium M. Sure, Prescott had a very long pipeline, but my code is highly predictable since N=2^30. So what makes the Prescott that slow despite its high CPU frequency?