If I write
int main(int argc, char *argv[])
{
    int temp[50][3];
    return &temp[argc] - &temp[0];
}
and compile it with Visual C++, I get back:
009360D0 55                   push        ebp  
009360D1 8B EC                mov         ebp,esp  
009360D3 8B 45 08             mov         eax,dword ptr [argc]  
009360D6 8D 0C 40             lea         ecx,[eax+eax*2]  
009360D9 B8 AB AA AA 2A       mov         eax,2AAAAAABh  
009360DE C1 E1 02             shl         ecx,2  
009360E1 F7 E9                imul        ecx  
009360E3 D1 FA                sar         edx,1  
009360E5 8B C2                mov         eax,edx  
009360E7 C1 E8 1F             shr         eax,1Fh  
009360EA 03 C2                add         eax,edx  
009360EC 5D                   pop         ebp  
009360ED C3                   ret  
Why am I getting an imul instruction here instead of just bit shifts, etc.?  I find this pretty annoying, because I'm doing pointer arithmetic like this in a tight loop, and I'm suspecting imul is killing its performance. In any case, it shouldn't be necessary.
Is there a good way to prevent it in general and instead replace it with cheaper operations?
Update:
In my original program, I tried adding a dummy variable to make the size of each element a multiple of 4 instead of 3 so the compiler can use bit-shifting instead of division.
The result? Even though the data structure was bigger, the running time of the program decreased from 9.2 seconds to 7.4 seconds.
So yes, this is indeed very slow.
 
     
    