I am trying to understand the effects of memory aliasing and how to improve my code to avoid it. I am re-writing my cache coherent Entity Component System and I want to take memory aliasing into account.
The main source I have is Christer Ericson's talk from GDC 2003, thus why I would like to know if the problems he describes have somehow been mitigated by modern C++ compilers.
In specific, do modern C++ compilers suffer from memory aliasing as much as Christer says, especially for member variable access (due to the implicit 'this' ptr)?
#include <stdlib.h>
class TestList
{
  public:
    TestList()
    {
       // atoi here to avoid compiler optimization around hardcoded 20
       count = atoi("20");
       data = new int64_t[count];
    }
    int64_t count;
    int64_t* data;
    void ClearOptimized();
    void ClearNonOptimized();
};
// Not inlined on purpose
void TestList::ClearOptimized()
{
    // According to Christer, this avoids aliasing even for
    // simple compilers, because we are aiding the compiler
    // to identify that there is no aliasing in the iteration.
    for (int64_t i = 0, size = count; i < size; ++i)
    {
        data[i] = 0;
    }
}
void TestList::ClearNonOptimized()
{
    // According to Christer the compiler doesn't know
    // if 'count' is aliasing with data... A smart compiler
    // should be able to identify it might be aliased for
    // the first element only, but all other iterations can't
    // so it will unroll the first element of the loop into a
    // separated range check.
    for (int64_t i = 0; i < count; ++i)
    {
        data[i] = 0;
    }
}
int main()
{
    TestList listA;
    listA.ClearNonOptimized();
    TestList listB;
    listB.ClearOptimized();
    return listA.data[listA.count-1] + listB.data[listB.count-1];
}
I ran through a few websites that indicate that modern compilers still present most of those problems, although nowadays we seem to have better tools to avoid aliasing (such as type-punning).
I tried verifying this by looking at Compiler Explorer with the above code. But I find it hard to reason the assembly code... Both GCC and Clang with the highest optimization flag seem to be doing an extra ptr access every time.
Clang:
TestList::ClearOptimized():         # @TestList::ClearOptimized()
        push    rax
        mov     rdx, qword ptr [rdi]
        test    rdx, rdx
        jle     .LBB0_2
        mov     rdi, qword ptr [rdi + 8]
        shl     rdx, 3
        xor     esi, esi
        call    memset
.LBB0_2:
        pop     rax
        ret
TestList::ClearNonOptimized():      # @TestList::ClearNonOptimized()
        cmp     qword ptr [rdi], 0
        jle     .LBB1_3
        mov     rax, qword ptr [rdi + 8]
        xor     ecx, ecx
.LBB1_2:                                # =>This Inner Loop Header: Depth=1
        mov     qword ptr [rax + 8*rcx], 0
        add     rcx, 1
        cmp     rcx, qword ptr [rdi]     <=== Is this due to memory aliasing?
        jl      .LBB1_2
.LBB1_3:
        ret
GCC:
TestList::ClearOptimized():
        mov     rdx, QWORD PTR [rdi]
        test    rdx, rdx
        jle     .L6
        mov     rdi, QWORD PTR [rdi+8]
        sal     rdx, 3
        xor     esi, esi
        jmp     memset
.L6:
        ret
TestList::ClearNonOptimized():
        cmp     QWORD PTR [rdi], 0
        jle     .L8
        mov     rdx, QWORD PTR [rdi+8]
        xor     eax, eax
.L10:
        mov     QWORD PTR [rdx+rax*8], 0
        add     rax, 1
        cmp     QWORD PTR [rdi], rax     <=== Is this due to memory aliasing?
        jg      .L10
.L8:
        ret
Am I reading this right? Does that mean it will fetch the information in the cache instead of using it from a register?