I would like to know if the following is possible in any of the SIMD families of instructions.
I have a qword input with 63 significant bits (never negative). Each sequential 7 bits starting from the LSB is shuffle-aligned to a byte, with a left-padding of 1 (except for the most significant non-zero byte). To illustrate, I'll use letters for clarity's sake.
The result is only the significant bytes, thus 0 - 9 in size, which is converted to a byte array.
In:         0|kjihgfe|dcbaZYX|WVUTSRQ|PONMLKJ|IHGFEDC|BAzyxwv|utsrqpo|nmlkjih|gfedcba
Out: 0kjihgfe|1dcbaZYX|1WVUTSRQ|1PONMLKJ|1IHGFEDC|1BAzyxwv|1utsrqpo|1nmlkjih|1gfedcba
Size = 9
In:  00|nmlkjih|gfedcba
Out: |0nmlkjih|1gfedcba
Size = 2
I do understand the padding is separate. The shuffle-aligning is my question. Is this possible?
EDIT 2
Here is my updated code. Gets a sustained 46 M / sec for random-length input on single thread Core 2 Duo 2 GHz, 64 bit.
private static int DecodeIS8(long j, ref byte[] result)
{
    if (j <= 0)
    {
        return 0;
    }
    int size;
    // neater code: gives something to break out of
    while (true)
    {
        result[0] = (byte)((j & 0x7F) | 0x80);
        size = 0;
        j >>= 7;
        if (j == 0) break;
        result[1] = (byte)((j & 0x7F) | 0x80);
        size++;
        j >>= 7;
        if (j == 0) break;
        result[2] = (byte)((j & 0x7F) | 0x80);
        size++;
        j >>= 7;
        if (j == 0) break;
        result[3] = (byte)((j & 0x7F) | 0x80);
        size++;
        j >>= 7;
        if (j == 0) break;
        result[4] = (byte)((j & 0x7F) | 0x80);
        size++;
        j >>= 7;
        if (j == 0) break;
        result[5] = (byte)((j & 0x7F) | 0x80);
        size++;
        j >>= 7;
        if (j == 0) break;
        result[6] = (byte)((j & 0x7F) | 0x80);
        size++;
        j >>= 7;
        if (j == 0) break;
        result[7] = (byte)((j & 0x7F) | 0x80);
        size++;
        j >>= 7;
        if (j == 0) break;
        result[8] = (byte)j;
        return 9;
    }
    result[size] ^= 0x80;
    return size + 1;
}