Why is a single thread spread across CPU's?

Question

I'm just curious why the scheduler constantly moves an app between CPUs, rather than keeping it on one. It looks a bit silly to have 4 cores at 25% rather than one at 100%.

Does it has to do with heat, or is it more efficient somehow? Do other OS's do it differently?

Insights or links to in-depth stuff would be nice. (Couldn't find much myself.)

Update:

By "spread out" I don't mean that it executes on several cpu's at once, but is being moved from one to the other several times per second, making the effect that it looks spread out.

score 8 · Accepted Answer · edited Sep 29 '20 at 19:54

I think wierob has described the point fairly well.
Here is an older article discussing processor affinity settings with a quad-core QX6800.
(the link points to the second page of that article).

If you do not force process affinity to a core do you loose on performance?

While the Windows scheduler needs to decide such affinity to avoid thrashing with caches,
the processor design itself also considers such things.
The Intel QX6800 quad-core (since i refer it earlier in this answer)
has an 8MB L3 cache shared across its 4 cores.

It should be noted that while you may have chosen to run just this one single-threaded process on the system, the OS itself would have several other tasks running which also need to be scheduled. The scheduler balances all this activity across the available processor pool (or cores).

Going forward, with the Nehalem architecture and NUMA,
processors across multiple sockets will also be able to better address access thrash.
Here is a quick picture from an ArsTechnica page on NUMA.

enter image description here

_{If Nehalem and i7 interest you, I have some more links at this answer.}

wierob · Answer 2 · 2009-08-20T08:15:19.847

The scheduler just executes the next thread that is ready for execution on a "free" core/CPU.

You can assign a process to a specific CPU via the Windows task manager.

Having 4 cores at 25% means that 4 threads are executed simultaneously. Whereas, one core at x% means that only one thread is executed. So the former is more efficient in some cases.

But during its execution the cache of the CPU is filled with data accessed by the thread. So if the thread gets executed on another CPU, it will experience more cache misses, which are costly, since the data is not in the cache of this CPU.

What does your thread do? If the thread "sleeps" for a very short time the core it was executed on before might be occupied by another threat and thus your thread is executed on the next available core. What happens if you specify only one core to be used by your process (e.g. ia task manager)?

David Balažic · Answer 3 · 2016-12-05T01:20:04.677

The OS migrates the thread across CPU cores (quickly, several times per second). It is more efficient to run it on the same core all the time. This can be enforced by the "Set affinity" context menu item in Task Manager.

Note that usually (typical home use) the difference is in the range of few percents.

The "4 cores each at 25% usage" means, as Task Manager shows average use, that each core was fully utilized one quarter of time and free the rest of time.

The description is for Windows, but it is similar on other operating systems too.

score 0 · Answer 4 · answered Aug 20 '09 at 07:09

It's not. One thread can only run on one processor. However, some processes have multiple threads, which can be spread out.

The reasoning, believe it or not, never considered what it looks like. The system tries to spread threads out because it has no way to know when one will spike.

JakL · Answer 5 · 2011-06-26T21:10:59.790

If anyone's still reading this, I've noticed this, too, and performed quite a few tests to see if it's not just a fluke. It turns out it's not! I believe spreading a single thread over all cores is more efficient for several reasons:

Spreading one thread across all cores allows for a lower power consumption. Most processors lower their frequencies and, more importantly, voltage according to load, so a Core 2 Quad, for example, will consume a lot less power and produce less heat by spreading one thread across all 4 cores rather than using one core (which would lead to the voltage increasing across ALL cores, since there's only one voltage regulator* - that's pretty ineffective).
It ensures that the thread always runs at maximum/constant speed. If the thread suddenly requests more processing power, one core could become overloaded and there will be a delay in the execution. By spreading it across cores, any sudden spike will be handled smoothly without lags and delays.

Also, because of the above two observations, I have come to believe that Turbo Boost and IDA are ineffective. They might be useful on older operating systems, but Linux and Windows 7 spread everything across all cores pretty efficiently. So, a Core 2 Quad q9100 @ 2.26 GHz will almost (there are always exceptions :-) always be faster than a Core 2 Duo X9100 @ 3.06GHz, and I've rarely seen it use IDA (basically the predecessor to Turbo boost, increases frequency on one or two cores only for single threaded apps).

The Core 2 Quad has two clock domains thanks to the fact that there are two physical dies, so two cores can run at full frequency, while two are at the lowest frequency. I don't know whether there are two voltage regulators, though - I've noticed that the voltage is uniform across all 4 cores, so there must be only one regulator for the whole package.

Why is a single thread spread across CPU's?

5 Answers5