Assume that I have two processes that both share a memory block using shm_open and mmap and there exists a shared synchronization primitive - let's say a semaphore - that ensures exclusive access to the memory. I.e. no race conditions.
My understanding is that the pointer returned from mmap must still be marked as volatile to prevent cached reads.
Now, how does one write e.g. a std::uint64_t into any aligned position in the memory?
Naturally, I would simply use std::memcpy but it does not work with pointers to volatile memory.
First attempt
// Pointer to the shared memory, assume it is aligned correctly.
volatile unsigned char* ptr;
// Value to store, initialize "randomly" to prevent compiler
// optimization, for testing purposes.
std::uint64_t value = *reinterpret_cast<volatile std::uint64_t*>(nullptr);
// Store byte-by-byte
unsigned char* src = reinterpret_cast<unsigned char*>(&value);
for(std::size_t i=0;i<sizeof(value);++i)
    ptr[i]=src[i];
I strongly believe this solution is correct but even with -O3, there are 8 1-byte transfers. That is really not optimal.
Second Attempt
Since I know no one is going to change the memory while I have it locked, maybe the volatile is unnecessary after all?
// Pointer to the shared memory, assume it is aligned correctly.
volatile unsigned char* ptr;
// Value to store, initialize "randomly" to prevent compiler
// optimization for testing purposes.
std::uint64_t value = *reinterpret_cast<volatile std::uint64_t*>(0xAA);
unsigned char* src = reinterpret_cast<unsigned char*>(&value);
//Obscure enough?
auto* real_ptr = reinterpret_cast<unsigned char*>(reinterpret_cast<std::uintptr_t>(ptr));
std::memcpy(real_ptr,src,sizeof(value));
But this does not seem to work, compiler sees through the cast and does nothing. Clang generates ud2 instruction, not sure why, is there UB in my code? Apart from value initialization.
Third attempt
This one comes from this answer. But I think it does break strict aliasing rule, does it not?
// Pointer to the shared memory, assume it is aligned correctly.
volatile unsigned char* ptr;
// Value to store, initialize "randomly" to prevent compiler
// optimization for testing purposes.
std::uint64_t value = *reinterpret_cast<volatile std::uint64_t*>(0xAA);
unsigned char* src = reinterpret_cast<unsigned char*>(&value);
volatile std::uint64_t* dest = reinterpret_cast<volatile std::uint64_t*>(ptr);
*dest=value;
Gcc actually does what I want - a simple one instruction to copy 64bit value. But it is useless if it is UB.
One way how I could go about fixing it is to really create std::uint64_t object at that place. But, apparently placement new does not work with volatile pointers either.
Questions
- So, is there a better (safe) way than byte-by-byte copy?
- I would also like to copy even larger blocks of raw bytes. Can this be done better than by individual bytes?
- Is there any possibility to force memcpydo the right thing?
- Do I needlessly worry about the performance and should just go with the loop?
- Any examples(mostly C) do not use volatileat all, should I do that too? Ismmaped pointer treated differently already? How?
Thanks for any suggestions.
EDIT:
Both processes run on the same system. Also please assume the values can be copied byte-by-byte, not talking about complex virtual classes storing pointers to somewhere. All Integers and no floats would be just fine.
 
     
    