3

My system is a Debian Testing, kernel 6.11.4-amd64 in a Lenovo W540. I am running since yesterday a high demand task (to manipulate two large datasets using R) that was going to take 26 hours to complete. At some point at night I decided to move the computer to a cooler room and go to bed. The task was at 30% completed. When I checked this morning, instead of 75%-80%-ish I expected, I found 35% only. And then realised that the CPU frequency was down scaled to 798 MHz instead of 2.7 GHz and the CPU temperature was not higher than 40ÂșC. I am pretty sure that a restart would fix the issue (I hope) and make the CPUs to run at their usual speed, but I don't want to waste 20 hours of data processing time, so all my attempts are oriented to no reboot.

I assumed that the process of unplugging the laptop making it to run on battery caused that power-saving feature to trigger. But plugging in it again in another room didn't resume the full performance mode.

So I checked and saw that

root@debian:~# lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          39 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   8
  On-line CPU(s) list:    0-7
Vendor ID:                GenuineIntel
  BIOS Vendor ID:         GenuineIntel
  Model name:             Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz
    BIOS Model name:      Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz  CPU @ 0.0GHz
    BIOS CPU family:      12
    CPU family:           6
    Model:                60
    Thread(s) per core:   2
    Core(s) per socket:   4
    Socket(s):            1
    Stepping:             3
    CPU(s) scaling MHz:   22%
    CPU max MHz:          3700.0000
    CPU min MHz:          800.0000
    BogoMIPS:             5387.17
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch
                          _perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 s
                          se4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid ept_ad fs
                          gsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts vnmi md_clear flush_l1d
Virtualization features:  
  Virtualization:         VT-x
Caches (sum of all):      
  L1d:                    128 KiB (4 instances)
  L1i:                    128 KiB (4 instances)
  L2:                     1 MiB (4 instances)
  L3:                     6 MiB (1 instance)
NUMA:                     
  NUMA node(s):           1
  NUMA node0 CPU(s):      0-7
Vulnerabilities:          
  Gather data sampling:   Not affected
  Itlb multihit:          KVM: Mitigation: VMX disabled
  L1tf:                   Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
  Mds:                    Mitigation; Clear CPU buffers; SMT vulnerable
  Meltdown:               Mitigation; PTI
  Mmio stale data:        Unknown: No mitigations
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Not affected
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:             Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP conditional; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
  Srbds:                  Mitigation; Microcode
  Tsx async abort:        Not affected

root@debian:~# cpupower frequency-info analyzing CPU 3: driver: intel_cpufreq CPUs which run at the same hardware frequency: 3 CPUs which need to have their frequency coordinated by software: 3 maximum transition latency: 20.0 us hardware limits: 800 MHz - 3.70 GHz available cpufreq governors: performance schedutil current policy: frequency should be within 800 MHz and 3.70 GHz. The governor "schedutil" may decide which speed to use within this range. current CPU frequency: Unable to call hardware current CPU frequency: 798 MHz (asserted by call to kernel) boost state support: Supported: yes Active: yes

So the CPU can run at 3.70 GHz but is scaled at 22% and set at 798 MHz.

I changed the setup to 'performance' to enable the full performance of the CPU:

root@debian:~# cpupower frequency-set -g performance
Setting cpu: 0
Setting cpu: 1
Setting cpu: 2
Setting cpu: 3
Setting cpu: 4
Setting cpu: 5
Setting cpu: 6
Setting cpu: 7

But nothing changed. After looking at cpupower man page I tried setting the max frequency directly:

root@debian:~# cpupower frequency-set -f 3.70 GHz
Setting cpu: 0
Setting cpu: 1
Setting cpu: 2
Setting cpu: 3
Setting cpu: 4
Setting cpu: 5
Setting cpu: 6
Setting cpu: 7

Without any success:

root@debian:~# cpupower frequency-info
analyzing CPU 0:
  driver: intel_cpufreq
  CPUs which run at the same hardware frequency: 2
  CPUs which need to have their frequency coordinated by software: 2
  maximum transition latency: 20.0 us
  hardware limits: 800 MHz - 3.70 GHz
  available cpufreq governors: userspace performance schedutil
  current policy: frequency should be within 800 MHz and 3.70 GHz.
                  The governor "performance" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 798 MHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes

At this point, sounded weird to me that 2 CPUs only run at the same frequency and I checked:

root@debian:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor 
performance
root@debian:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq 
798101
root@debian:~# cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq 
798096
root@debian:~# cat /sys/devices/system/cpu/cpu2/cpufreq/scaling_cur_freq 
798092
root@debian:~# cat /sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq 
798101
root@debian:~# cat /sys/devices/system/cpu/cpu4/cpufreq/scaling_cur_freq 
798099
root@debian:~# cat /sys/devices/system/cpu/cpu5/cpufreq/scaling_cur_freq 
798104
root@debian:~# cat /sys/devices/system/cpu/cpu6/cpufreq/scaling_cur_freq 
798097
root@debian:~# cat /sys/devices/system/cpu/cpu7/cpufreq/scaling_cur_freq 
798099

The governor has been set to 'performance' as expected, but the frequency has not been updated, and not only that, all the CPUs are running at slightly different frequencies.

If I try to write directly the frequency value:

root@debian:~# echo 3700000 | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
tee: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq: Permission denied
tee: /sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq: Permission denied
tee: /sys/devices/system/cpu/cpu2/cpufreq/scaling_cur_freq: Permission denied
tee: /sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq: Permission denied
tee: /sys/devices/system/cpu/cpu4/cpufreq/scaling_cur_freq: Permission denied
tee: /sys/devices/system/cpu/cpu5/cpufreq/scaling_cur_freq: Permission denied
tee: /sys/devices/system/cpu/cpu6/cpufreq/scaling_cur_freq: Permission denied
tee: /sys/devices/system/cpu/cpu7/cpufreq/scaling_cur_freq: Permission denied
3700000

My thinking is that this value is controlled by the intel_pstate driver, but I didn't find how to manipulate that. I also found the intel-speed-select tool but it turns out that

Intel speed select drivers are not loaded on this system. Verify that kernel config includes CONFIG_INTEL_SPEED_SELECT_INTERFACE. If the config is included then this is not a supported platform.

I don't know how to check if that driver is loaded or not, if it's loadable without a restart if not, and even if that's the right path to follow.

I saw some similar questions, but all the suggested solutions were unsuccessfuly tried here.

What I would need is to re-enable the performance mode without a restart. I assume it has to be possible since the opposite happened without a restart.

EDITED: Although it was 30 hours later than expected, the task the computer was running finished, so I was able to save the results and restart the system.

Everything is fine now and it could be that this was a one-off issue. Anyway, I will try to reproduce it to diagnose it better during the weekend, just in case there's some misconfiguration somewhere.

Javi
  • 131

1 Answers1

0

Many of my Intel laptops over the years went to ~800mhz when they think they're running on battery, or when they "don't trust the charger" (think the charger don't deliver enough power to run the system stable at high frequencies, eg the charger would get damaged, deliver some power but not up to specs, and the laptop would put all cores to 800mhz)

Edit: here is a random person with a similar experience: https://superuser.com/a/349365/519577

Try replacing the charger?

hanshenrik
  • 1,925
  • 3
  • 25
  • 38