I can use _mm_set_epi64 to store two uint64_ts into a __m128 intrinsic. But hunting around, I see various ways to get the values back out: There's reinterpret_cast (and it's evil twin C-style casts), it's sibling union { __m128; uint64[2]; }; and memcpy, there's accessing fields of __m128. There's __m128i _mm_load_si128(__m128i *p);, but I'm not seeing a _mm_get_* function. Am I missing something? If there's a _mm_set_epi64 then there must be a non-cast way to get the uint64_ts back out, right? (Otherwise why would they bother providing _mm_set_epi64?)
I see Get member of __m128 by index? but the "correct answer" has a broken link and implies there's a load function, but all the loads I see map __m128 to __m128. Shouldn't there be a void _mm_get_epi64(__m128, uint64_t* outbuf)?