This is an optimization problem. I want to copy a bitfield of six 5-bit elements to a u8 buffer, naively done like this:
void Expand(u32 x, u8 b[6]) {
    b[0] = (x >> 0) & 31;
    b[1] = (x >> 5) & 31;
    b[2] = (x >> 10) & 31;
    b[3] = (x >> 15) & 31;
    b[4] = (x >> 20) & 31;
    b[5] = (x >> 25) & 31;
}
This is the assembly generated by x86 msvc v19.latest, flags /O2 /Ot /Gr, gcc and clang will give approximately the same thing.
@Expand@8 PROC
        mov     al, cl
        and     al, 31
        mov     BYTE PTR [edx], al
        mov     eax, ecx
        shr     eax, 5
        and     al, 31
        mov     BYTE PTR [edx+1], al
        mov     eax, ecx
        shr     eax, 10
        and     al, 31
        mov     BYTE PTR [edx+2], al
        mov     eax, ecx
        shr     eax, 15
        and     al, 31
        mov     BYTE PTR [edx+3], al
        mov     eax, ecx
        shr     eax, 20
        shr     ecx, 25
        and     al, 31
        and     cl, 31
        mov     BYTE PTR [edx+4], al
        mov     BYTE PTR [edx+5], cl
        ret     0
@Expand@8 ENDP
But I just don't like it; I know it does exactly what it should be doing, it just seems to me that it could be a lot more efficient.
To me it looks like a 30-bit number that needs to be scaled up to a 48-bit number while inserting zeroes.
                  11111 11111 11111 11111 11111 11111
                                                    ↓
00011111 00011111 00011111 00011111 00011111 00011111
I've been trying SHIFTing, ORing, only ANDing at the end with a u64 (0x1f1f1f1f1f1f), but I remain unsuccessful in my optimization efforts. I'm confident this should be doable in less than 10 instructions, any guidance would be appreciated.
EDIT
I scratched my head a little bit more and this is the best I could come up with so far:
void Expand(u32 x, u8 b[6]) {
    memset(b, 31, 6);
    b[0] &= x;
    b[1] &= x >>= 5;
    b[2] &= x >>= 5;
    b[3] &= x >>= 5;
    b[4] &= x >>= 5;
    b[5] &= x >>= 5;
}
Compiles to:
@Expand@8 PROC
        mov     eax, 0x1f1f1f1f
        mov     DWORD PTR [edx], eax
        mov     WORD PTR [edx+4], ax
        and     BYTE PTR [edx], cl
        shr     ecx, 5
        and     BYTE PTR [edx+1], cl
        shr     ecx, 5
        and     BYTE PTR [edx+2], cl
        shr     ecx, 5
        and     BYTE PTR [edx+3], cl
        shr     ecx, 5
        and     BYTE PTR [edx+4], cl
        shr     ecx, 5
        and     BYTE PTR [edx+5], cl
        ret     0
@Expand@8 ENDP
 
     
     
    