Tthe function difference may compute the amount of unequal elements of two char arrays of the length n:
size_t difference(size_t n, const char a[n], const char b[n]) {
    size_t res = 0;
    for (size_t i = 0; i < n; i++)
        res += a[i] != b[i];
    return res;
}
How can I make the iteration faster regarding the memory allocation of the arrays and the caches? Is there a way to use SSE intel intrinsics?
EDIT:
This is a standalone program which may use as few cycles as possible and use no compiler directives/pragmas.
 
     
    