How can I give temperature monitoring absolute priority over stress testing?

Question

I am trying to stress test my home-built NAS for cooling problems.

Since neither stress, nor FIRESTARTER nor mprime95 look after the temperatures, I want to write a small script that kills all of them (ie the one of them I am currently running) if the temperature goes up too much:

sudo renice -n -20 $$; \
maxitemp=0; \
while [ $maxitemp -le 40 ]; do
sleep 1
maxitemp=$(s-tui -j | jq "[.Temp|.[]|tonumber]|max")
echo "$(date +%Y-%m-%d_%H:%M:%S) Maximal Temperature $maxitemp"
done; \
echo "$(date +%Y-%m-%d_%H:%M:%S) EMERGENCY KILL BECAUSE OF HIGH TEMPERATURE" | tee -a ~/stresstest.txt; \
killall stress; \
killall FIRESTARTER; \
killall mprime

However, if I boot up my Ubuntu live CD, connect it to the internet, install s-tui and jq and mprime and run it, prime95 starts the workers and the computer (a laptop, since I am testing the test before really running it on my precious NAS) stops responding, I cannot cancel prime95, no mouse moves any more, just the optical drive goes crazy. I have to stop it by pressing the power button long enough to switch the machine off. Even when I replace the script from above with a simple

sudo renice -n -20 $$; \
sleep 30; 
killall mprime

Why is that so? How can I give my monitoring and safety nets absolute priority over the stressing?

Update

It turned out that priorization was not the problem but mprime using too much RAM which pushed the swap/disk cache out of the RAM what made the drive go crazy and the system unresponsive.

https://www.mersenneforum.org/showthread.php?t=25429

I'll leave this question here because I think the answer of powerload79 can be very helpful to others!

Gordan Bobić · Answer 1 · 2020-04-04T13:25:37.817

You cannot give it absolute priority, but you can do the following to massively maximise the relative priority of the temperature monitoring:

1) Make sure you run the stressful task at nice -n 19 to minimize it's priority, in addition to setting the priority of the monitoring process to -20.

2.1) Use cgroups to further decrease the priority of the stressful tasks:

# Create a user and group called idle:
adduser idle

# Create a corresponding cgroup
/bin/cgcreate -a idle:idle -t idle:idle -g cpu:idle
/bin/cgset -r cpu.shares=2 idle

# Run your stressful proccess as part of this cgroup
/bin/cgexec -g cpu:/idle /usr/bin/mprime

2.2) Use cgroups to further increase the priority of the monitoring tasks:

# Create a user and group called fast:
adduser fast

# Create a corresponding cgroup
/bin/cgcreate -a fast:fast -t fast:fast -g cpu:fast
/bin/cgset -r cpu.shares=262144 fast

# Run your minitoring proccess as part of this cgroup
/bin/cgexec -g cpu:/fast /usr/bin/my_monitoring_script

This will make a huge amount of difference when it comes to preventing an intensive task with interfering with other worloads. Not only will your monitoring task struggle less with getting to run, but all other regular processes will be more able to run unimpeded while the stress test still gets all of the uncontested clock cycles.

3) Use tuned-adm to set the profile to "latency-performance" if your distro ships with that tool, e.g. Fedora/CentOS/RHEL. You may have to build it yourself on Ubuntu/Debian.

How can I give temperature monitoring absolute priority over stress testing?

Update

1 Answers1