I tried a test case which compares the three options - global, local, local static for about 20 million ops of a simple vector inner product for 4d vectors. This was done on VS2010 32-bit release version. Here's the result:
DPSUM:600000000 TIME:78| DPSUM:600000000 TIME:62| DPSUM:600000000
TIME:63| DPSUM:600000000 TIME:47| DPSUM:600000000 TIME:46|
DPSUM:600000000 TIME:78| DPSUM:600000000 TIME:47| DPSUM:600000000
TIME:47| DPSUM:600000000 TIME:78| DPSUM:600000000 TIME:47|
DPSUM:600000000 TIME:47| DPSUM:600000000 TIME:62| DPSUM:600000000
TIME:62| DPSUM:600000000 TIME:47| DPSUM:600000000 TIME:63|
DPSUM:600000000 TIME:46| DPSUM:600000000 TIME:63| DPSUM:600000000
TIME:62| DPSUM:600000000 TIME:47| DPSUM:600000000 TIME:47|
DPSUM:600000000 TIME:78| DPSUM:600000000 TIME:47| DPSUM:600000000
TIME:46| DPSUM:600000000 TIME:78| DPSUM:600000000 TIME:47|
DPSUM:600000000 TIME:47| DPSUM:600000000 TIME:62| DPSUM:600000000
TIME:63| DPSUM:600000000 TIME:47| DPSUM:600000000 TIME:62|
The first column is the static const, second is local and the third is global. I'm posting the sample code if you want to try on your platform. Looks like static local and local are equally fast - at least for this compiler (maybe due to some internal optimization.
Code below:
#include <stdio.h>
#include <windows.h>
int ag[] = {1,2,3,4}; int bg[] = {1,2,3,4};
int dp1(){
static const int a[] = {1,2,3,4}; static const int b[] = {1,2,3,4};
return a[0]*b[0] + a[1]*b[1] + a[2]*b[2] + a[3]*b[3];
}
int dp2(){
int a[] = {1,2,3,4}; int b[] = {1,2,3,4};
return a[0]*b[0] + a[1]*b[1] + a[2]*b[2] + a[3]*b[3];
}
int dp3(){
return ag[0]*bg[0] + ag[1]*bg[1] + ag[2]*bg[2] + ag[3]*bg[3];
}
int main(){
int numtrials = 10;
typedef int (*DP)();
DP dps[] = {dp1, dp2, dp3};
for (int t = 0; t < numtrials; ++t){
int dpsum[] = {0,0,0};
for (int jj =0; jj <3; ++jj){
DWORD bef, aft;
bef = GetTickCount();
for (int ii =0; ii< 20000000; ++ii){
dpsum[jj] += dps[jj]();
}
aft = GetTickCount();
printf("DPSUM:%d TIME:%d| ", dpsum[jj], aft - bef);
}
printf("\n");
}
getchar();
}