3

I was looking at ways to keep the CPU cool when running long jobs (40-70 mins long) that use all available cores at maximum available frequency. Also I have a lot of these jobs to run.

First I came across this script but I don't understand bash much and couldn't figure out when I kill the script, if it will restore default values in /sys/devices/system/cpu/cpu0/cpufreq so I abandoned that approach.

Then I came across this question but my PC only shows performance and powersave in scaling_available_governors. It is set to powersave.

  • Does it mean the PC is automatically throttling performance to keep things cool?

Also in my /sys/devices/system/cpu/cpu0/cpufreq folder I found a file energy_performance_preference whose options are:

default, performance, balance_performance, balance_power, power

Mine is set to balance_performance.

  • What is this option and where can I find out what each one does?

Yes I know people say that modern CPUs will handle throttling and heat on their own, but nevertheless I feel about PCs the same way about cars. Sure you can get your RPM to redline, and most modern cars will prevent you from permanently damaging your engine. But once you make it a habit to redline frequently, surely the life-expectancy of the engine goes down.

  • Are their any recommended easy to use tools for Ubuntu that will let me set a max/min temperature and throttle performance like the linked script attempted to do?
ITA
  • 163

2 Answers2

2

Yes, things are shifting more and more to hardware control.

I am also looking for more answers on this, but here is what I have found so far:

https://www.mankier.com/8/x86_energy_perf_policy ($ man x86_energy_perf_policy)

The EPP settings affect "how aggressively the hardware enters and exits CPU idle states (C-states) and Processor Performance States (P-states)" for example.

It looks like "balance_performance" is the default setting for having a responsive, well-performing system but with "potentially-significant energy saving"

I have tried the "power" EPP and it results in a less responsive system due to increased latency of changing p-states and entering/leaving deeper c-states. While it may offer the potential for greater energy savings, it also seems to require a much more highly-optimized system to take advantage of the aggressive power saving as per the "race to sleep" https://en.wikichip.org/wiki/race-to-sleep e.g., so that a computer is not constantly entering and exiting c-states, as there is still significant overhead in doing so (both in terms of power and latency), at least from what I can observe.

It looks like you can change maximum and minimum frequencies as part of the EPP kit, as well as enable/disable turbo boost, which is where I would start, as it sounds like you are configuring a system to "run a marathon" vs "sprint."

While the CPU regulates performance and temperature well enough to protect the cores, it is less clear to me whether build up of heat (even within CPU maximum nominal tolerances) would negatively affect the life of other, more heat-sensitive components. So if I were you, I guess I would run some test jobs and monitor heat across memory, motherboard, disk drives, GPU (if relevant), etc. You will know pretty quickly whether or not the system is regulating within its limits. If not, you can simply decrease the maximum available frequency until you get to a thermal envelope sustainable for ALL your system components under its current cooling regiment.

Also, for a GUI-based solution for some of these parameters, check out the gnome extension: https://extensions.gnome.org/extension/1082/cpufreq/ In my experience, a bit of a system hog but will let you easily set min-max frequencies and disable turbo boost.

ethan
  • 121
0

If you have many cores, running your jobs in a VM will give you control over how many cores will be used simultaneously. The possibility of control of memory allocated may also help reduce temperature.

Clearly this is not a super-efficient use of resources - but as a simple temporary measure it is very effective.