0

I have a remote server running Apache with some websites.  Sometimes load average rises too much, and the web server is unresponsive.

I think it's caused by Apache, but I can't check because the ssh session is closed automatically when I log in. I can solve it by restarting the server (actually, I have to call the provider to restart it manually).

Once it's restarted, I can check on Cacti that the load average was too high (more than 100).

Can anyone explain any way to find and solve the problem?  Maybe I need a trigger or something like that to restart Apache when the load average rises; is that likely to be a useful approach?

isma
  • 3
  • 1
  • 2

1 Answers1

0

The first thing you need to do is monitor what is going on, come back and update your question when you have more details.

Use a small script that will query system and memory load every few seconds and save that info to a file. Perhaps something like this:

#!/bin/sh
while true
do
    echo "-------`date`--------"
    echo "\t\t%MEM\t%CPU"
    ps ax -o comm,%mem,%cpu | sort -nk3 | tail -n 3
    sleep 30
done

The script will print the usage statistics for the three most CPU heavy processes and then for the 3 most memory heavy processes. It will then wait for 30 seconds (you can change that by giving a different number to sleep) and do it all again. Its output looks like this on my system:

$ ./monitor.sh
-------Mon Feb  4 20:00:51 CET 2013--------
                %MEM %CPU
java             9.1  3.6
Xorg             3.3  4.9
firefox          8.1 12.2
        ---     
Xorg             3.3  4.9
firefox          8.1 12.2
java             9.1  3.6

Save this script as monitor.sh and make it executable and run it in the background while redirecting its output to a file:

chmod 744 monitor.sh
./monitor.sh > usage.log &

You can monitor the progress by running tail -f usage.log.

Let this run for a while and check what was going on the next time your server becomes unresponsive. Be careful though, the script is printing out 9 lines of every 30 seconds. If you let it run too long, you will get a pretty big file. Remember to stop it when you have collected the necessary information.

terdon
  • 54,564