I'm trying to do some calculations with complex floating point numbers, using the __m128 vector units. With __m128 I can store two complex floats, as each complex number consists of two floating point numbers, one real and one imaginary part.
So far, so good.
My problem arises when i must "collect" my answers into one complex float. Say I have two __m128 vectors, and four complex numbers stored in these two vectors. As an example, I can add two vectors (two and two floats) together using the _mm_add_ps intrinsic, but how do I "reduce" the two complex numbers in the result vector to one complex number (two floats) and store it in an array?
And similarly, if I want to grab a complex number from my array and store it twice inside a vector (the real part in the 1st and 3rd block, and the imaginary part in the 2nd and 4th block), how can I accomplish this?