My multithreaded C program runs the following routine :
#define NUM_LOOP 500000000
long long sum = 0;
void* add_offset(void *n){
        int offset = *(int*)n;
        for(int i = 0; i<NUM_LOOP; i++) sum += offset;
        pthread_exit(NULL);
}
Of Course sum should be updated by acquiring a lock, but before that I have an issue with the running time of this simple program. 
When the main function is (Single Thread):
int main(void){
        pthread_t tid1;
        int offset1 = 1;
        pthread_create(&tid1,NULL,add_offset,&offset1);
        pthread_join(tid1,NULL);
        printf("sum = %lld\n",sum); 
        return 0;
}
The output and running time are :
sum = 500000000
real    0m0.686s
user    0m0.680s
sys     0m0.000s
When the main function is (Multi Threaded Sequential) :
int main(void){
        pthread_t tid1;
        int offset1 = 1;
        pthread_create(&tid1,NULL,add_offset,&offset1);
        pthread_join(tid1,NULL);
        pthread_t tid2;
        int offset2 = -1;
        pthread_create(&tid2,NULL,add_offset,&offset2);
        pthread_join(tid2,NULL);
        printf("sum = %lld\n",sum);
        return 0;
}
The output and running time are :
sum = 0
real    0m1.362s
user    0m1.356s
sys     0m0.000s
So far the program runs as expected. But when the main function is (Multi Threaded Concurrent):
int main(void){
        pthread_t tid1;
        int offset1 = 1;
        pthread_create(&tid1,NULL,add_offset,&offset1);
        pthread_t tid2;
        int offset2 = -1;
        pthread_create(&tid2,NULL,add_offset,&offset2);
        pthread_join(tid1,NULL);
        pthread_join(tid2,NULL);
        printf("sum = %lld\n",sum);
        return 0;
}
The output and running time are :
sum = 166845932
real    0m2.087s
user    0m3.876s
sys     0m0.004s
The erroneous value of sum due to lack of synchronization is not the issue here, but the running time. The actual running time of concurrent execution far exceeds that of the sequential execution. It is opposite to what is expected of concurrent execution in a multicore CPU.
Please explain what might be the problem here.