0

I was benchmarking if it is worth to put loop inside a callback function so I tested fourth order Runge-Kutta of on y'=y in C++, all with gcc 5.1 on Ubuntu with compilation command

g++ -std=c++11 -O3 -march=native --fast-math test.cpp

The Runge-Kutta loop

double dt=(t_end-t_0)/N;
auto y=y_0;
auto t=t_0;
for(size_t k=0;k<N;++k)
    {
    auto k_1=f(k*dt, y);
    auto k_2=f(k*dt + 0.5*dt, y + 0.5*dt*k_1);
    auto k_3=f(k*dt + 0.5*dt, y + 0.5*dt*k_2);
    auto k_4=f(k*dt + dt, y + dt*k_3);

    y+=dt*(k_1 + 2*k_2 + 2*k_3 + k_1)/6.0;
    }

return y;

Inlining was achieved by a template and a function object. For dynamic binding a function pointer was used.

The Pentium M

Specs as given by /proc/cpuinfo

cpu family      : 6
model           : 13
model name      : Intel(R) Pentium(R) M processor 1.73GHz
stepping        : 8
microcode       : 0x20

Frequency from sudo cpufreq-info

current policy: frequency should be within 800 MHz and 1.73 GHz.
                The governor "userspace" may decide which speed to use
                within this range.
current CPU frequency is 1.73 GHz (asserted by call to hardware).

Results

                  ODE solution       exp(1)             diff                   Execution time
Function pointer  2.718281828037378  2.718281828459045  -4.21667145644733e-10  53321972
Inlined call      2.718281828037378  2.718281828459045  -4.21667145644733e-10  19916460

The Prescott:

Specs as given by /proc/cpuinfo

cpu family      : 15
model           : 4
model name      : Intel(R) Pentium(R) 4 CPU 3.40GHz
stepping        : 3
microcode       : 0x5

Frequency from sudo cpufreq-info

current policy: frequency should be within 2.80 GHz and 3.40 GHz.
                The governor "userspace" may decide which speed to use
                within this range.
current CPU frequency is 3.40 GHz.

Results

                  ODE solution       exp(1)             diff                   Execution time
Function pointer  2.718281828037378  2.718281828459045  -4.21667145644733e-10  70811683
Inlined call      2.718281828037378  2.718281828459045  -4.21667145644733e-10  19928642

Comparison and question

So the Prescott performs no better (it seems to be much worse), than the much lower clocked Pentium M. Sure, Prescott had a very long pipeline, but my code is highly predictable since N=2^30. So what makes the Prescott that slow despite its high CPU frequency?

bwDraco
  • 46,683
user877329
  • 768
  • 1
  • 5
  • 16

0 Answers0