I am trying to learn what _mm256_permute2f128_ps() does, but can't fully understand the intel's code-example.
DEFINE SELECT4(src1, src2, control) {
    CASE(control[1:0]) OF
    0:  tmp[127:0] := src1[127:0]
    1:  tmp[127:0] := src1[255:128]
    2:  tmp[127:0] := src2[127:0]
    3:  tmp[127:0] := src2[255:128]
    ESAC
    IF control[3]
        tmp[127:0] := 0
    FI
    RETURN tmp[127:0]
}
dst[127:0] := SELECT4(a[255:0], b[255:0], imm8[3:0])
dst[255:128] := SELECT4(a[255:0], b[255:0], imm8[7:4])
dst[MAX:256] := 0
Specifically, I don't understand:
- the - imm8[3:0]notation. Are they using it as a 4-byte mask? But I've seen people invoke- _mm256_permute2f128_pd(myVec, myVec, 5), where imm8 is used as a number (number 5).
- Inside the - SELECT4function, what does- control[1:0]mean? Is control a byte-mask, or used as a number? How many bytes is it made of?
- why IF control[3]is used in intel's example. Doesn't it undo the choice3:insideCASE? Why would we ever want to settmp[127 to 0]to zero, if we've been outputting into it?
 
    