I am trying to stress test my home-built NAS for cooling problems.
Since neither stress, nor FIRESTARTER nor mprime95 look after the
temperatures, I want to write a small script that kills all of them (ie the
one of them I am currently running) if the temperature goes up too much:
sudo renice -n -20 $$; \
maxitemp=0; \
while [ $maxitemp -le 40 ]; do
sleep 1
maxitemp=$(s-tui -j | jq "[.Temp|.[]|tonumber]|max")
echo "$(date +%Y-%m-%d_%H:%M:%S) Maximal Temperature $maxitemp"
done; \
echo "$(date +%Y-%m-%d_%H:%M:%S) EMERGENCY KILL BECAUSE OF HIGH TEMPERATURE" | tee -a ~/stresstest.txt; \
killall stress; \
killall FIRESTARTER; \
killall mprime
However, if I boot up my Ubuntu live CD, connect it to the internet, install s-tui and jq and mprime and run it, prime95 starts the workers and the computer (a laptop, since I am testing the test before really running it on my precious NAS) stops responding, I cannot cancel prime95, no mouse moves any more, just the optical drive goes crazy. I have to stop it by pressing the power button long enough to switch the machine off. Even when I replace the script from above with a simple
sudo renice -n -20 $$; \
sleep 30;
killall mprime
Why is that so? How can I give my monitoring and safety nets absolute priority over the stressing?
Update
It turned out that priorization was not the problem but mprime using too much RAM which pushed the swap/disk cache out of the RAM what made the drive go crazy and the system unresponsive.
https://www.mersenneforum.org/showthread.php?t=25429
I'll leave this question here because I think the answer of powerload79 can be very helpful to others!