How can I change the distribution of Cores across NUMA-nodes?

Question

On Ubuntu 14.04 TLS for 36 total cores = (2 x CPUs x 9 Cores x 2 HyperThreading), lscpu give me:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                36
On-line CPU(s) list:   0-35
Thread(s) per core:    2
Core(s) per socket:    9
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Stepping:              2
CPU MHz:               1200.000
BogoMIPS:              5858.45
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              25600K
NUMA node0 CPU(s):     0-8,18-26
NUMA node1 CPU(s):     9-17,27-35

As known, data exchange faster across Cores of single CPU (via cache-L3) than across Cores of several different CPUs (via QPI-link).

0-8 and 9-17 are physical CPU-cores of two NUMA-nodes, but 18-26 and 27-35 are HyperThreading CPU-Cores, and is preferred at first to take all the physical cores, and then in the second round to take on two logical cores on each physical cores, i.e. will this increase the overall performance?

Or does it mean that if I launch more than 8 threads, for example, 12 threads, then 9 threads (0-8) will execute on the 1st CPU (NUMA node0) and 3 threads (9-12) on the 2nd CPU (NUMA node1)? And will this increase the latency of exchange between the threads and reduce the overall performance?

How can I change the distribution of Cores across NUMA-nodes to set as below?

NUMA node0 CPU(s):     0-17
NUMA node1 CPU(s):     18-35

score 1 · Answer 1 · answered Mar 25 '18 at 00:17

You can't change the distribution of cores between numa-nodes.

The numa nodes are physical constraints related to you having 2 sockets.

Numa -- non-uniform-memory-architecture, has a penalty for accessing memory off of the local node. You should see hyperthreaded cores show up as 'cores' on the same numa node as the main cores.

I.e. if you have 9 cores/numa node and and turned on hyperthreading (not usually a win for performance in most situations, as the hyperthreaded cores steal resources from the main cores resulting in more 'thrashing', UNLESS your hyperthreaded processes are part of the same program and using the same parts of cache as the main cores. Usually, people treat the cores as independent resources with hyper threaded cores also being treated independently.

For general workloads, on machines that are "loaded", hyperthreading will result in increased resource contention and slow down overall throughput. If your workload is such that all the main cpu's and hypercpu's are dedicated to the same program, and running related threads -- then you may get better performance with hyperthreading.

Given your questions, I'd really suggest turning off hyperthreading as it doesn't seem like you are reserving 1 numa node for 1 application where they can all share local memory and resources.

Ignoring the hyperthread issue -- since that's mostly just an added complication with same rules applying, you can't move cpu's between nodes, as the cpu's core are also physical things that are physically located in one chip (numa node) or the other.

You CAN bind processes and threads to one numa node or the other using processor affinity -- perhaps that's what you are wanting?

How can I change the distribution of Cores across NUMA-nodes?

1 Answers1