2

I have a Windows Server 2003 running and I'm suspecting the HDD might be getting old on it.

Lately I've been having connection problems on the server, at random times Internet goes away. Sometimes it loses it's Internet while I'm connected through it using Windows' Remote Desktop Connection.

When the machine is rebooted it always gets its Internet back. I've had to write a application that checks to see if it can find Google every 30 minutes and if it can't it restarts the network adapter, waits 10 minutes, re-enables the adapter and if it still can't find Google after that it automatically reboots the machine, thus getting Internet access once again.

Now, that's a pretty bad deal for a server. Could this possibly be because of the HDD starting to give up? Since the application reboots the machine the entire OS does not freeze, it just loses connection. And I know that it is my app that reboots since it makes a log entry before it does so.

Anyone have any tips what-so-ever of how I could investigate the reason of Internet suddenly disappearing?

Is there a simple way of checking the HDD health in Windows Server 2003?

Red
  • 21
  • 2

2 Answers2

3

Please back-up first as Jeff suggested.

First things first

Have you tried updating your bios and drivers, especially chipset/network/raid?

Easy things to do, might remedy the problem and more. :-)

Have you checked the event log?

If it does show a moment in time, you could skip the video recording I suggest later. Still have a ping to your gateway running so the detection is done as fast as possible and not seconds later. Apart from that, it might even give you the cause and solve the problem altogether.

Have you tried generating I/O to see if that does cause a drop?

Try disk benchmarking software, especially those that list stability tests to see if it drops your connection. Try an error scan with HDTune for example, it should trigger any bad sections on your disk...

Introduction to troubleshooting

First of all, you need to be sure that the server is the issue and not the router dropping your lease. Also, when the problem is so frequent that you can monitor it live, you probably want to see what happens at that moment. You can do this either by monitoring live or by logging it and troubleshooting afterwards.

If not, your best bet is to set up video recording to know the moment it occured so you can look back.

For the logging tools, logging for a long time (30 mins) will fill up either disk or memory. So to have sufficient detail this might not be feasible if you are going to let it run unattended for a long time...

Monitoring

Ping? Pong! Ping? Pong?

start ping -t GATEWAY && ping -t DOMAIN

Replace GATEWAY by your Router IP and DOMAIN by an internet Domain, eg. Google.com.

This will continuously ping both addresses and you will immediately be able to see when your connection drops which allows you to see what exactly happens at that moment.

What are my processes doing?

You can use Process Explorer to:

  • Check what Processes are active at that moment, include as much useful detail columns as you can.
  • Check if a Driver acts weird by 100% CPU spikes in the Interrupts, DPC or System process.
  • Check what I/O usage is being done, shows both Disk and Network.
  • Check for memory and handle leaks.
  • Does not log anything, apart from usage graphs.

What I/O is happening?

You can use Process Monitor to:

  • Log Process, Thread and I/O (Disk + Network) activity.
  • Filter away things that might not be cause to decrease data usage, make sure to drop events.
  • Has useful tables and graphs you could look too, but might not help as your issue is a specific time.

What else is happening?

You can use XPerf from the Windows Performance Toolkit in the Windows SDK can do this too.

Please note that this will use a cycled buffer in memory to prevent full memory, if you want your disk to fill instead you can replace the profile in the command by perf!GeneralProfiles.InSequentialFile. Word of caution is that you might not want to run this unattended with such amount of details as your disk will fill very fast, and configuruing it for less detail might leave the cause out...

Troubleshooting can be done like this:

  1. Download the setup from Windows Performance Analysis Tools for your Windows version.
  2. Install the software on your system.
  3. Open a command prompt as administrator, and copy paste the next command:

    xperf -start perf!GeneralProfiles.InBuffer && timeout -1 && xperf -stop perf!GeneralProfiles.InBuffer myTrace.etl
    
  4. Press ENTER once to start the command, now you will have to wait until the connection drops.

  5. Right after your connection drops you go to the console and press ENTER.
  6. After waiting some time a log file myTrace.etl will be produced.
  7. Run the following command to show the file:

    xperf myTrace.etl
    

If you want to upload it so we can look into it:

  1. Compress myTrace.etl to a zip file.
  2. Put this compressed version of the file somewhere online (perhaps 2shared).
  3. Share the link here, I will do an attempt to find and show you the cause of your problem.
1

It is not likely, but it certainly is possible. I've seen HDD damage cause some very strange issues.

Before you do anything else get a good backup!!!!

The first thing to do is to run a chkdsk
go to start->Run->type cmd->Ok->Chkdsk->enter.
This will check your root drive. If you have time to spare offline, you can do a chkdsk /r which will do a through disk check at reboot. This can take a long time and your server will be offline.

If you still suspect damage you may want to check to see if your disk has S.M.A.R.T. Checks available. If you do you may be able to analyze with a tool like PassMark.

Jeff F.
  • 4,443