I recently started using the Intel C++ compiler for some of my projects, while also learning masm assembly. I kept on hearing how it wasn't worth learning assembly since the compilers do a good job anyway of optimizing code, and so thought about having a look at which one was faster once and for all. To try and do so, I had the following c++ code:
#include <iostream>
#include <time.h>
using namespace std;
extern "C" {
int Add(int a, int b);
}
int main(int argc, char * argv[]){
        int startingTime = clock();
        for (int i = 0; i < 100; i++)
        {
            cout << "normal: " << i << endl;
            cout << 1000 + 1000 << endl;
        }
        int timeTaken1 = clock() - startingTime;
        startingTime = clock();
        for (int i = 0; i < 100; i++){
             cout << "assem" << i << endl;
             cout << Add(2000, 2000) << endl;
        }
        int timeTaken2 = clock() - startingTime;
        cout << "Time taken under normal addition: " << timeTaken1 << endl;
        cout << "Time taken under assembly addition: " << timeTaken2 << endl;
        cin.get();
        return 0;
   }
And the following masm code:
.model flat
.386
.code
    public _Add
_Add PROC
        push ebp            ;
        mov ebp, esp        ;
        mov eax, [ebp + 8]  ;
        mov ebx, [ebp + 12] ;
        add eax, ebx        ;
        leave               ; cleanup
        ret                 ;
_Add endp
end
I am using Visual Studio to compile this, using the Intel Composer plugin. When I run this under Debug mode, it works perfectly - I can see "normal 99" and "assem 99" along with the relevant number. When I run this with /0d specified for the compiler, then it also works fine. However, when /02, /0x or /03 are specified, it only shows the normal (i+j) addition loop and the first value of the assembler addition i.e. only assem 0 and 4000 are shown.
My guess is that the assembly code is being optimized out by the Intel Compiler (this works fine with the VC++ compiler), and am curious to find out why this is occurring and how it can be worked around, while still letting Intel optimize the C++ part.
Thanks SbSpider
EDIT: I know this is a late, but thanks for all of the replies. It seems that it was an error in the assembly code rather than the intel compiler not using the assembly code.