I am looking for an instruction like PHADDD just for Quadwords. PHADDQ does not exist, is there some instruction like this?
Asked
Active
Viewed 132 times
1 Answers
3
phaddd is no faster than 2 shuffles + a vertical add, so it's only worth considering when you're using 2 separate inputs.
If you were planning to use it with both inputs the same, just use pshufd to copy+swap into another vector. (Or if you just want a scalar horizontal sum, even movhlps can be worth considering to extract the high 64 bits into another register.)
To fully emulate phaddq, you just need two shuffles to take your A B and C D inputs and give you A C and B D vectors you can add to get A+B and C+D elements. That's what punpcklqdq and punpckhqdq do. (unpack quad to dq)
Peter Cordes
- 328,167
- 45
- 605
- 847