I have a class that takes 64 bit in memory. To implement equality, I used reinterpret_cast<uint64_t*>, but it results in this warning on gcc 7.2 (but not clang 5.0):
$ g++ -O3 -Wall -std=c++17 -g -c example.cpp 
example.cpp: In member function ‘bool X::eq_via_cast(X)’:
example.cpp:27:85: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     return *reinterpret_cast<uint64_t*>(this) == *reinterpret_cast<uint64_t*>(&x);                                                                                     ^
From my understanding, casting is undefined behavior unless you are casting to the actual type or to char*. For instance, there could be architecture specific layout restricts while loading values. That is why I tried alternative approaches.
Here is the source code of a simplified version (link to godbolt):
#include <cstdint>
#include <cstring>
struct Y
{
    uint32_t x;
    bool operator==(Y y) { return x == y.x; }
};
struct X
{
    Y a;
    int16_t b;
    int16_t c;
    uint64_t to_uint64() {
        uint64_t result;
        std::memcpy(&result, this, sizeof(uint64_t));
        return result;
    }
    bool eq_via_memcpy(X x) {
        return to_uint64() == x.to_uint64();
    }
    bool eq_via_cast(X x) {
        return *reinterpret_cast<uint64_t*>(this) == *reinterpret_cast<uint64_t*>(&x);
    }
    bool eq_via_comparisons(X x) {
        return a == x.a && b == x.b && c == x.c;
    }
};
static_assert(sizeof(X) == sizeof(uint64_t));
bool via_memcpy(X x1, X x2) {
    return x1.eq_via_memcpy(x2);
}
bool via_cast(X x1, X x2) {
    return x1.eq_via_cast(x2);
}
bool via_comparisons(X x1, X x2) {
    return x1.eq_via_comparisons(x2);
}
Avoiding the cast by explicitly copying the data via memcpy prevents the warning. As far as I understand it, it should also be portable.
Looking at the assembler (gcc 7.2 with -std=c++17 -O3), memcpy is optimized perfectly while the straightforward comparisons lead to less efficient code:
via_memcpy(X, X):
  cmp rdi, rsi
  sete al
  ret
via_cast(X, X):
  cmp rdi, rsi
  sete al
  ret
via_comparisons(X, X):
  xor eax, eax
  cmp esi, edi
  je .L7
  rep ret
.L7:
  sar rdi, 32
  sar rsi, 32
  cmp edi, esi
  sete al
  ret
Very similar with clang 5.0 (-std=c++17 -O3):
via_memcpy(X, X): # @via_memcpy(X, X)
  cmp rdi, rsi
  sete al
  ret
via_cast(X, X): # @via_cast(X, X)
  cmp rdi, rsi
  sete al
  ret
via_comparisons(X, X): # @via_comparisons(X, X)
  cmp edi, esi
  jne .LBB2_1
  mov rax, rdi
  shr rax, 32
  mov rcx, rsi
  shr rcx, 32
  shl eax, 16
  shl ecx, 16
  cmp ecx, eax
  jne .LBB2_3
  shr rdi, 48
  shr rsi, 48
  shl edi, 16
  shl esi, 16
  cmp esi, edi
  sete al
  ret
.LBB2_1:
  xor eax, eax
  ret
.LBB2_3:
  xor eax, eax
  ret
From this experiment, it looks like the memcpy version is the best approach in performance critical parts of the code.
Questions:
- Is my understanding correct that the memcpyversion is portable C++ code?
- Is it reasonable to assume that the compilers are able to optimize away the memcpycall like in this example?
- Are there better approaches that I have overlooked?
Update:
As UKMonkey pointed out, memcmp is more natural when doing bitwise comparisons. It also compiles down to the same optimized version:
bool eq_via_memcmp(X x) {
    return std::memcmp(this, &x, sizeof(*this)) == 0;
}
Here is the updated godbolt link. Should also be portable (sizeof(*this) is 64 bit), so I assume it is the best solution so far.
