While trying to increase the speed of my applications on non-NUMA / standard PCs I always found that the bottleneck was the call to malloc() because even in multi-core machines it is shared/synch between all the cores.
I have available a PC with NUMA architecture using Linux and C and I have two questions:
- In a NUMA machine, since each core is provided with its own memory, will
malloc()execute independently on each core/memory without blocking the other cores? - In these architectures how are the calls to
memcpy()made? Can this be called independently on each core or, calling it in once core will block the others? I maybe wrong but I remember that alsomemcpy()got the same problem ofmalloc()i.e. when one core is using it the others have to wait.