My system is a Debian Testing, kernel 6.11.4-amd64 in a Lenovo W540. I am running since yesterday a high demand task (to manipulate two large datasets using R) that was going to take 26 hours to complete. At some point at night I decided to move the computer to a cooler room and go to bed. The task was at 30% completed. When I checked this morning, instead of 75%-80%-ish I expected, I found 35% only. And then realised that the CPU frequency was down scaled to 798 MHz instead of 2.7 GHz and the CPU temperature was not higher than 40ÂșC. I am pretty sure that a restart would fix the issue (I hope) and make the CPUs to run at their usual speed, but I don't want to waste 20 hours of data processing time, so all my attempts are oriented to no reboot.
I assumed that the process of unplugging the laptop making it to run on battery caused that power-saving feature to trigger. But plugging in it again in another room didn't resume the full performance mode.
So I checked and saw that
root@debian:~# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: GenuineIntel
BIOS Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz
BIOS Model name: Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz CPU @ 0.0GHz
BIOS CPU family: 12
CPU family: 6
Model: 60
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
Stepping: 3
CPU(s) scaling MHz: 22%
CPU max MHz: 3700.0000
CPU min MHz: 800.0000
BogoMIPS: 5387.17
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch
_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 s
se4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid ept_ad fs
gsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts vnmi md_clear flush_l1d
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 128 KiB (4 instances)
L1i: 128 KiB (4 instances)
L2: 1 MiB (4 instances)
L3: 6 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-7
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: KVM: Mitigation: VMX disabled
L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Meltdown: Mitigation; PTI
Mmio stale data: Unknown: No mitigations
Reg file data sampling: Not affected
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP conditional; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Srbds: Mitigation; Microcode
Tsx async abort: Not affected
root@debian:~# cpupower frequency-info
analyzing CPU 3:
driver: intel_cpufreq
CPUs which run at the same hardware frequency: 3
CPUs which need to have their frequency coordinated by software: 3
maximum transition latency: 20.0 us
hardware limits: 800 MHz - 3.70 GHz
available cpufreq governors: performance schedutil
current policy: frequency should be within 800 MHz and 3.70 GHz.
The governor "schedutil" may decide which speed to use
within this range.
current CPU frequency: Unable to call hardware
current CPU frequency: 798 MHz (asserted by call to kernel)
boost state support:
Supported: yes
Active: yes
So the CPU can run at 3.70 GHz but is scaled at 22% and set at 798 MHz.
I changed the setup to 'performance' to enable the full performance of the CPU:
root@debian:~# cpupower frequency-set -g performance
Setting cpu: 0
Setting cpu: 1
Setting cpu: 2
Setting cpu: 3
Setting cpu: 4
Setting cpu: 5
Setting cpu: 6
Setting cpu: 7
But nothing changed. After looking at cpupower man page I tried setting the max frequency directly:
root@debian:~# cpupower frequency-set -f 3.70 GHz
Setting cpu: 0
Setting cpu: 1
Setting cpu: 2
Setting cpu: 3
Setting cpu: 4
Setting cpu: 5
Setting cpu: 6
Setting cpu: 7
Without any success:
root@debian:~# cpupower frequency-info
analyzing CPU 0:
driver: intel_cpufreq
CPUs which run at the same hardware frequency: 2
CPUs which need to have their frequency coordinated by software: 2
maximum transition latency: 20.0 us
hardware limits: 800 MHz - 3.70 GHz
available cpufreq governors: userspace performance schedutil
current policy: frequency should be within 800 MHz and 3.70 GHz.
The governor "performance" may decide which speed to use
within this range.
current CPU frequency: Unable to call hardware
current CPU frequency: 798 MHz (asserted by call to kernel)
boost state support:
Supported: yes
Active: yes
At this point, sounded weird to me that 2 CPUs only run at the same frequency and I checked:
root@debian:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
root@debian:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
798101
root@debian:~# cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq
798096
root@debian:~# cat /sys/devices/system/cpu/cpu2/cpufreq/scaling_cur_freq
798092
root@debian:~# cat /sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq
798101
root@debian:~# cat /sys/devices/system/cpu/cpu4/cpufreq/scaling_cur_freq
798099
root@debian:~# cat /sys/devices/system/cpu/cpu5/cpufreq/scaling_cur_freq
798104
root@debian:~# cat /sys/devices/system/cpu/cpu6/cpufreq/scaling_cur_freq
798097
root@debian:~# cat /sys/devices/system/cpu/cpu7/cpufreq/scaling_cur_freq
798099
The governor has been set to 'performance' as expected, but the frequency has not been updated, and not only that, all the CPUs are running at slightly different frequencies.
If I try to write directly the frequency value:
root@debian:~# echo 3700000 | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
tee: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq: Permission denied
tee: /sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq: Permission denied
tee: /sys/devices/system/cpu/cpu2/cpufreq/scaling_cur_freq: Permission denied
tee: /sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq: Permission denied
tee: /sys/devices/system/cpu/cpu4/cpufreq/scaling_cur_freq: Permission denied
tee: /sys/devices/system/cpu/cpu5/cpufreq/scaling_cur_freq: Permission denied
tee: /sys/devices/system/cpu/cpu6/cpufreq/scaling_cur_freq: Permission denied
tee: /sys/devices/system/cpu/cpu7/cpufreq/scaling_cur_freq: Permission denied
3700000
My thinking is that this value is controlled by the intel_pstate driver, but I didn't find how to manipulate that. I also found the intel-speed-select tool but it turns out that
Intel speed select drivers are not loaded on this system. Verify that kernel config includes CONFIG_INTEL_SPEED_SELECT_INTERFACE. If the config is included then this is not a supported platform.
I don't know how to check if that driver is loaded or not, if it's loadable without a restart if not, and even if that's the right path to follow.
I saw some similar questions, but all the suggested solutions were unsuccessfuly tried here.
What I would need is to re-enable the performance mode without a restart. I assume it has to be possible since the opposite happened without a restart.
EDITED: Although it was 30 hours later than expected, the task the computer was running finished, so I was able to save the results and restart the system.
Everything is fine now and it could be that this was a one-off issue. Anyway, I will try to reproduce it to diagnose it better during the weekend, just in case there's some misconfiguration somewhere.