While benchmarking code involving std::optional<double>, I noticed that the code MSVC generates runs at roughly half the speed compared to the one produced by clang or gcc. After spending some time reducing the code, I noticed that MSVC apparently has issues generating code for std::optional::operator=. Using std::optional::emplace() does not exhibit the slow down.
The following function
void test_assign(std::optional<double> & f){
    f = std::optional{42.0};
}
produces
sub     rsp, 24
vmovsd  xmm0, QWORD PTR __real@4045000000000000
mov     BYTE PTR $T1[rsp+8], 1
vmovups xmm1, XMMWORD PTR $T1[rsp]
vmovsd  xmm1, xmm1, xmm0
vmovups XMMWORD PTR [rcx], xmm1
add     rsp, 24
ret     0
Notice the unaligned mov operations. On the contrary, the function
void test_emplace(std::optional<double> & f){
    f.emplace(42.0);
}
compiles to
mov     rax, 4631107791820423168      ; 4045000000000000H
mov     BYTE PTR [rcx+8], 1
mov     QWORD PTR [rcx], rax
ret     0
This version is much simpler and faster.
These were generated using MSVC 19.32 with /O2 /std:c++17 /DNDEBUG /arch:AVX.
clang 14 with -O3 -std=c++17 -DNDEBUG -mavx produces
movabs  rax, 4631107791820423168
mov     qword ptr [rdi], rax
mov     byte ptr [rdi + 8], 1
ret
in both cases.
Replacing std::optional<double> with
struct MyOptional {
    double d;
    bool hasValue; // Required to reproduce the problem
    
    MyOptional(double v) {
        d = v;
    }
    void emplace(double v){
        d = v;
    }
};
exhibits the same issue. Apparently MSVC has some troubles with the additional bool member.
See godbolt for a live example.
Why is MSVC producing these unaligned moves? I.e. the question is not why they are unaligned rather than aligned (which wouldn't improve things according to this post). But why does MSVC produce a considerably more expensive set of instructions in the assignment case? Is this simply a bug (or missed optimization opportunity) by MSVC? Or am I missing something?
