Why do modern pre-emptive multitasking OSs hang when the CPU is loaded?

Question

This is an open-ended question, with many possible answers, but perhaps there's one big thing I'm overlooking. If not, perhaps this question should be community wiki.

I don't use other OSs often enough to judge, but certainly on all versions of Windows the operating system can be brought to a crawl when apps are pegging the CPU. Understandable with the early versions, with cooperative multitasking.

However, with pre-emptive multitasking, shouldn't the operating system put itself and its GUI at a higher priority, so as to remain responsive even when user apps are asking for full CPU utilization? After all, the OS doesn't have to give away any time slices. In most cases, I don't care that an app which will require minutes of CPU time is delayed by a few microseconds so that the OS GUI can respond to input.

It sometimes helps to set high-CPU processes to a lower priority, perhaps because that then lets other low-CPU apps I interact with be more responsive, giving the appearance of a more responsive overall experience. Or does the priority of an app really affect how it interacts with OS processes?

I've seen this happen many times when I had plenty of physical memory available, and without heavy hard drive use. It seems like CPU usage is the main consistent element when the OS flags.

A counter-example: often when a system is almost completely hung, the mouse remains responsive. So the OS does protect this one part of itself from some problems. Exactly how is a separate question, I just raise it as an example.

score 6 · Answer 1 · answered Jul 25 '12 at 02:24

The only things that can really make one of the "modern multitasking OS" truly hang are:

hardware failure
CPU is stuck in a device driver (because of 1 or bad programming)
fatal exception in device driver or other kernel code (because of 1 or bad programming)

The operating system in a multitasking OS is always going to cut off a task when its timeslice ends. However, if a program is designed to respond to user input, but doesn't during its timeslice, then the fault is with the program

It's more likely the shell being unresponsive. In Windows this is explorer.exe. You may try the following:

an alternate Windows shell (Litestep, etc.)
kill all explorer.exe's via taskmgr.exe, then launch cmd.exe, and do your stuff via the command line. Or launch a smaller program designed to launch other programs.

explorer.exe is one of those heavily componentized Windows programs that a lot of stuff can hook into. So see how things are without it.

score 3 · Answer 2 · answered Mar 12 '14 at 18:23

Try Linux. If you actually compile the kernel you can specify the time slice. Also you actually get to see the effect of preempt. A 1000us time slice is better if you are building a server which will (probably) have no UI to worry about (but it will still be a preemptive multitasking OS). On the other hand a 100us will result in an extremely responsive system. Most Distributions have 100us on their Desktop OS, which means even if my CPU is touching 100% usage on all cores, my UI is still responsive (you can actually try it).

score 2 · Answer 3 · answered Jul 25 '12 at 01:27

With respect to Windows, Windows 7 does a much better job than XP when it comes to this sort of thing, so I would disagree with your statement "all versions of Windows". But even with XP, when you are using the client version of the OS, Windows will give some extra priority to the foreground app (by default). No matter which OS version though, if multiple processes are all stuck waiting on the same single, shared resource (could be I/O) than they will all behave in an unresponsive fashion.

Another way to view this problem, if explorer.exe is busy waiting for a shared resource (including CPU time), then the desktop/window manager itself will behave unresponsive. Likewise for any apps that directly or indirectly wait for explorer.exe to free up.

score 2 · Answer 4 · answered Mar 12 '14 at 16:25

User-mode applications generally can't slow down your OS's GUI. However, the situation isn't as clear-cut as it seems. There isn't a single user-mode application that is 100% user-mode, either because of system calls (notably the file system) or because of virtual memory mapping.

In most cases, the OS isn't stuck doing CPU-work, it's stuck waiting on some I/O resource. This is especially apparent when you run out of physical memory and start going into the pagefile (although do note that the OS can put memory in the page file at its leisure, even if there's plenty of physical RAM available). Windows is usually pretty smart about this, but it's quite possible to "break" this.

If it really is CPU, it's most likely due to a greedy driver (the vast majority of which is not written by Microsoft). Kernel-mode drivers are exempted from pre-emptive multi-tasking (both for latency and reliability reasons), so if a driver runs a second long loop on the CPU, you're out of luck, it will not be pre-empted.

A great and simple tool to see some of this is Process Explorer (from SysInternals) which shows you the kernel-time of a CPU (ie. how much of the CPU work on the core is done in the kernel, as opposed to the user applications themselves). Windows 7 and later also include this in their task manager (it's the red line in the CPU usage graphs).

All in all, pre-emptive multi-tasking doesn't save the OS designers from having to make compromises. There's always a cost to everything, and task scheduling is indeed very tricky (right now, my OS juggles over 2000 threads - that's quite a lot, and I'm not really doing "anything"). Would it be better to give the threads smaller time chunks, and spend more time doing context switching? Would it be better to give them longer time chunks, sacrificing latency?

So, check what the hard drives are doing. Check how your memory is used. Check the kernel-time. This will quite likely show you why Windows loses responsivity while you're doing heavy work. Some of these can be remedied (freeing memory, limiting the memory-offender, picking CPU affinity manually...), some are solved simply by keeping your drivers up to date (graphics drivers used to be quite notable for the issues they caused, which probably played a part in Windows dumping them back from kernel-mode to user-mode in recent versions).

Also, being on the subject of the GPU, modern Windows use GPU acceleration for rendering the GUI (actually, to a point, so did Windows XP). If your application taxes the GPU significantly, it might also lead to slow responsiveness of Windows, especially in Aero. Since GPUs are increasingly being used as GP "hyper" CPUs, this can be significant even outside of games and such.

Another major offender is a poorly written multi-threaded application. It's quite easy to kill memory caching if you're doing silly things, and RAM is extremely slow compared to the CPU itself, so without efficient caching, the CPU gets very, very slow (even though it's basically waiting the whole time). This is even more complicated on multi-core CPUs (and multi-CPU systems), because to ensure consistency, many multi-threaded operations require an invalidation of cached data (so that one CPU doesn't access the "old" value of the variable in memory out of its cache / registers). All those things are incredibly fast, but... CPUs are faster. Much faster. Which of course also introduces the issues of different CPU providers handling the same things differently, completely separate from the much higher-level OS. Modern CPUs (486+, so yeah, it's a very old feature a disturbing amount of programmers has no idea of) are actually heavily parallelized, they're no longer executing one instruction after another.

So even if the OS does everything perfectly (obviously an impossible ideal), it can still grind down to a halt due to hardware issues and communication with the hardware. What good is it that you have a 4-core CPU when your application manages to completely saturate the RAM R/W? What good is it having a fast hard drive, when every single byte it reads has to go through the CPU (remember PIO?). Every hardware operation that doesn't use direct memory access can potentially stall your CPU.

And now, most of Windows actually runs as user-mode applications. And they interact with the applications running as well - if I ask explorer.exe to do something for me, it can't do anything else in the mean-time. If I send a billion windows messages to a window, it's going to leave a mark.

There's just so much happening, all the time... guarantees are very difficult. Note how faster things become as soon as the "ctrl-alt-delete" screen actually comes on - suddenly it's as if nothing was wrong :)

LMSingh · Answer 5 · 2014-03-13T02:22:27.193

Specifically to windows coming to a crawl.. one of the most common causes I find is when multiple processes are using enough virtual memory to start forcing a lot of swapping. It may not "appear" that there is a lot of disk activity, however it usually is happening.

This is easily proven as well as works to get out of this jam by simply selecting some small processes (like notepad) and "cleanly" and "patiently" ending them. This starts to reduce the jammed freeway effect. And then you close one large memory hog and everything comes back to usable state. Observing disk activity during this time often shows very little is going on.

Why do I suggest to close smaller apps first.. By removing smaller apps first you are freeing up virtual memory with minimal disk activity and quickly unclogging the machine. If you try to close a large app, e.g. firefox or Chrome, they all have extended close times. This is because most last apps have background processes that will not go away for a looooong time because the background processes also are stuck getting paged in and out of the swap file. To make matters worse, they also happen to be lower priority they take even longer to finish their business, and the full quantity of virtual memory requested by the app has to wait to be released till all background processes and threads are done.

Another thing I've noticed from experience is that the browsers are one of the worst when it comes to closing quickly and cleanly. Even the office apps close more cleanly and quickly.

Why do modern pre-emptive multitasking OSs hang when the CPU is loaded?

5 Answers5