1

Note: this question is a duplicate of a prior question that has received a pretty detailed answer.

In the summer of 2023, I have published a long overdue summary of this topic at my employer's website.

=========

Dear fellow Superusers,

you may remember my inappropriately broad question a while ago... this is sequel. This time I have enough data to be more specific. Caught one culprit in the field and this time we were somewhat prepared. But, the first results are perplexing to me. It's not EIST, it's not PROCHOT, it's not CLOCKMOD. After today I am wondering: "what do Windows know that I do not" ? I must be missing some sweet secret - another MSR or clock steering mechanism...

The problem: on a couple dozen pieces of a PC running Windows Server 2012 R2, "every now and then" at random, some random machine starts to "ooze like molasses". The CPU becomes subjectively very slow. Windows Task Manager in 2012 reports the CPU clock firmly at "0.22 GHz", which is a weird value. The problem vanishes upon a power cycle.

The effective average MTTF so far has been about 20 years, i.e. impossible to reproduce in a lab. The machines run cool, far away from thermal throttling thresholds. Verified by generating 100% CPU load for a few days, while recording the coretemp sensor. In production, the machines actually run nearly idle for months - verified by some relevant messages in the Event Log, saying that the CPU has been at the lowest EIST clock for another day - this for 100+ consecutive days, until a power-cycle. There are no local environmental factors or other circumstances to correlate this behavior to.

The CPU is a Haswell Core i7 Core i7-4650U: dual-core with HT, 15W TDP, actually consuming maybe < 5W at idle. Nominally a "mobile" CPU - athough the machine is not a notebook, does not have a battery. Does have an embedded controller.

The CPU's nominal clock frequency (the one printed on the tin) is 1.7 GHz. EIST max is 2.3 GHz. Turbo can boost up to 3.3 GHz single core, 2.9 GHz across all cores. The reference clock appears to be 100 MHz from the onboard clock synth (actually more like 98 MHz, apparently). Under normal circumstances, in Windows or Linux at a typical workload = idle, the CPU frequency just stays put at 800 MHz (790 MHz reported) i.e. multiplier = 8.

Now for those 0.22 GHz. This is a single value, averaged over all four cores. Also, it is not a physical clock rate - rather, it is inferred by Windows from some nominal clock rate and some hardware "performance gages", probably MSR registers, where the value is supposedly a percentage of maximum performance.

Effective clock = Nominal max clock * "frequency percentage gage" * "throttling percentage gage"

or, in the parlance of Windows Performance Counters:

Effective clock = Nominal max clock * "% Processor Performance" 

That averaged across all four cores for the Task Manager GUI. The Windows performance counters (a software-level API/UI of the OS) makes the counters available per core, aggregated per CPU package and total per system.

That formula has been working for me in the lab, i.e. on a healthy system. I haven't found a Windows performance counter holding the "EIST Max" = our factual "frame of reference" for the "% Processor Performance" but never mind... The closest I could get by fiddling with ClockMod on a healthy machine idling at 800 MHz was by throttling two cores to 4/16 and two to 5/16, via the IA32_CLOCK_MODULATION MSR. Done using the uclewebb's MSR Tool - the "effective CPU clock" reported by the Windows Task Manager kept flipping between 0.23 and 0.25 GHz.

Following several people's advice that this might have to do with the PROCHOT signal or on-demand throttling, have I cobbled together my own tools to access MSR_POWER_CTL, IA32_PACKAGE_THERM_STATUS, IA32_THERM_STATUS and IA32_CLOCK_MODULATION - to have something small, simple and to the point, when investigating the next culprit out in the wild.

And, that culprit has come today, and, ...I'm in a bit of a shock. No PROCHOT sources are active, apparently PROCHOT has never happend since the last power-up, and "on-demand throtling" (aka CLOCKMOD) is off as well.

Checking various status flags in the MSR registers

...the same for CPU cores 1,2 and 3.

We've tried disabling BD-PROCHOT - the disabled flag sticks in the MSR, but the problem does not go away. Which is not surprising. We've tried fiddling with the CLOCKMOD bits - using another tool, that does read + write + readback. The tool works, but brings no improvement (again not surprisingly).

I have also taken a look at some Windows performance counters - using a command-line proggie called perf32 from the SnmpTools by Erwan L. Here is the output:

"Processor Information\% Processor Performance\_Total" 
9.49953814259068

"Processor Information% Processor Performance\0,0" 9.99836227595223

"Processor Information% Processor Performance\0,1" 9.00490682886555

"Processor Information% Processor Performance\0,2" 9.95220023870093

"Processor Information% Processor Performance\0,3" 8.99923060510645

"Processor Information% Processor Utility_Total" 9.15043821293382

"Processor Information% of Maximum Frequency_Total" 73

"Processor Information\Processor Frequency_Total" 1700

"Processor Information\Processor Frequency\0,0" 1700

the same for CPU core 1,2 and 3

"Processor Information\Processor State Flags\0,0" 1

the same for CPU core 1,2 and 3

"Processor Information\Parking Status_Total" 0

"Processor\Interrupts/sec\0" 1979.5905212033

"Processor% Interrupt Time_Total" 0.777807192025936

So... the CPU cores are stuck at their nominal "on the tin" frequency of 1.7 GHz, which is 75% of 2.3 GHz (EIST max). On demand throttling is off. But, something makes Windows believe that the overall "Processor Performance percentage" is just 9 or 10 per cent. Two cores at 9%, two cores at 10%, resulting in 9.5% aggregate systemwide percentage. 0.095 * 2300 MHz = 218 MHz .

Notice those percentages, their granularity: 9 and 10 per cent, rather precisely.

  • How does that fit in with the integer EIST multiplier (which is apparently locked at 17) and the 1/16th CLOCKMOD duty cycle, which also appears to be off in the first place?

  • What other factor do Windows take into account, calculating that CPU performance percentage? Or is it just read verbatim from the hardware? What MSR's should I check out in the hardware, to verify/understand the numbers that Windows are reporting?

  • does perhaps Turbo (the successor to EIST) make the performance control more granular, "free from the EIST limits", and does it bring some relevant new MSR's along?

  • those percentages might actually be wrong... the culprit system in the faulty state may actually be even slower than what the percentages would suggest. Just by subjective comparison to my lab experiments with CLOCKMOD throttling, at a similar "effective clock rate" around 0.25 MHz.

Thanks for your time. Any ideas welcome.

EDIT: oh damn - yes there are a number of MSR's. I wish Turbostat was available for Windows.

EDIT: I've written a tool to perform a raw dump of a given range of MSR's, on screen and into a CSV file. Turns out that the MSR space is rather sparse... This tool will allow me to take a dump of some meaningful range on the culprit/patient and on a healthy box, for comparison. The comparison can be somewhat automated as well. I can then wade through the data with an Intel manual at hand, focus on differences etc.

I've also tried using the Windows build of AcpiCA's acpidump etc. Haven't tried it on the culprit machine yet, but on my old notebook (2015 model Lenovo with UEFI) it couldn't find the _PSS object... I may give it another try on the problematic box.

frr
  • 303

1 Answers1

2

Short answer: it is a rarely occuring bug in the Intel Haswell CPU, steppings C-0 and D-0 (the only steppings there are?).

The area where this happens is the "modern performance and thermal management", whose "top of the glacier" is superficially visible as status registers (MSRs) 0x690, 0x6B0, 0x6B1 - that's where a single status flag can signal the flawed state. That status flag and maybe some performance counters, that can be compared to the raw TSC, whose difference can possibly tell you the level of actual throttling... The problem is invisible in IA32_CLOCK_MODULATION, the legacy THERM_STATUS MSR's and the legacy IA32_PERF_STATUS/CTL MSRs.

I did indeed dump the MSR's on the culprit and on a healthy running box, ran a diff between the dumps, spent a day weeding out the differences spotted with an Intel manual at hand, and tracked the problem down to a likely culprit in the form of bit 1 = "Thermal status" in MSR_CORE_PERF_LIMIT_REASONS (and in MSR_GRAPHICS_PERF_LIMIT_REASONS).

From there, Google found me an existing full answer here at Super User

There doesn't appear to be a way to clear that pesky overheat flag, once the bug has bitten. After finding the answer, still with the culprit alive in the failed state, we've tried at runtime to disable:

  • "Intel Dynamic Acceleration"
  • "EIST hardware coordination"
  • Turbo
  • EIST

The results seem to confirm the general wisdom - that the only way out is a power cycle.

There's also an Intel erratum, mentioning that this is somehow related to the C3 C-state - so I daresay preventing C-states deeper than C1 should work around the problem. Check out your BIOS SETUP. There's an MSR for C-state config, but on my system that MSR gets locked during the POST, so there's no way to disable C3+ at runtime in the hardware.

Even if the hardware/BIOS route is blocked, the OS can possibly take advice at boot, not to ask for deeper C-states. In Linux on an Intel CPU, this can be arranged using a well-known kernel command-line argument in the bootloader: intel_idle.max_cstate=1 . In Windows, apparently there are some ways too: there's a regedit curse, not very well documented but apparently somewhat popular, where you create a special reg value to fudge the CPU capabilities to make Windows avoid using C2/C3: reg add HKLM\System\CurrentControlSet\Control\Processor /v Capabilities /t REG_DWORD /d 0x0007e066 (sources: A, B ) and there's an alternative way via the active power profile (source - I won't repeat the whole procedure here, the keywords are perhaps powercfg.exe and IDLESTATEMAX 1).

frr
  • 303