What is more efficient and why?
Specifically _mm_loadu_si128 vs. _mm_load_si128 in C.
(Editor's note: or this was tagged assembly, possibly they meant movdqu vs. movdqa in hand-written asm. Which is not the same thing, especially without AVX, because _mm_load_si128 can compile into a memory operand for an ALU instruction with no separate movdqa at all.)