Add 32-bit words with saturation

Question

Do you know any way to add with saturation 32-bit signed words using MMX/SSE assembler instructions? I can find 8/16 bits versions but no 32-bit ones.

See [Agner Fog's vectorclass library](http://www.agner.org/optimize/#vectorclass) for an implementation of add and subtract with C++ intrinsics. A copy of the GPLed source [is here](https://github.com/pcordes/vectorclass/blob/77522287e64da5e887d69659e144d2caa5d3a4f1/vectori128.h#L2189), using XOR to check for same / different signs, and shifts / PANDN / PADDD to fix up the result. — Peter Cordes, Nov 24 '16 at 04:22

score 2 · Answer 1 · edited Oct 08 '22 at 22:01

You can emulate saturated signed adds by performing the following steps:

int saturated_add(int a, int b)
{
    int sum = a + (unsigned)b;                // avoid signed-overflow UB
    if (a >= 0 && b >= 0)
        return sum > 0 ? sum : INT32_MAX;     // catch positive wraparound
    else if (a < 0 && b < 0)
        return sum > 0 ? INT32_MIN : sum;     // catch negative wraparound
    else
        return sum;                           // sum of pos + neg always fits
}

Unsigned, it's even simpler, see this stackoverflow posting

In SSE2, the above maps to a sequence of parallel compares and AND/ANDN operations. No single operation is available in hardware, unfortunately.

[Bitwise saturated addition in C (HW)](https://stackoverflow.com/q/5277623) could probably vectorize better, with a couple `pxor` for `sum^a` and `sum^b`, and `pcmpgt(0, v)` or `psrad` — Peter Cordes, Oct 08 '22 at 22:03

Michiel · Answer 2 · 2016-11-23T13:01:41.143

Saturated unsigned subtraction is easy, because for `a -= b', we can do

    asm (
        "pmaxud %1, %0\n\t" // a = max (a,b)
        "psubd %1, %0" // a -= b
        : "+x" (a)
        : "xm" (b)
    );

with SSE.

I was looking for unsigned addition, but possibly, the only way is to transform to a saturated unsigned subtraction, perform it, and transform back. Same for signed variants.

EDIT: with unsigned addition, you get min (a, ~b) + b this way, which of course works. With signed addition and subtraction, you have two saturation boundaries, which makes things complicated.

Add 32-bit words with saturation

2 Answers2