0

Processes on Mac OSX getting 'stuck' and strange CPU usage describes the same phenomenon, but for writing. My question concerns reading.

OS X version 10.8.5

I have about 30 million small files on my internal 500GB hard drive, that I will want to back up to several external hard drives. Because of the number of files, I decided to compress them, move the compressed files around, and decompress them once I'm done. The compression speed was at 100% CPU usage for the first 80% or so, and then steadily dropped, now to about 3-7% CPU usage, having about 1000 reads / 1MB per second. Running top shows the process as stuck most of the time.

The remaining 20% of the files were written when the (spinning) hard drive was almost full (down to 2GB free space when I noticed it); I suspect these files were interspersed throughout the disk, and stuck means waiting to read the files from the disk. I have since deleted certain files, so the disk now has ~50GB free space; however, the performance hasn't improved.

Running renice -20 or sudo makes no difference; also, I read there is no way to defragment a Mac. I suspect I need to delete the relevant files and write them again when the disk is not as full, in order to be able to match the previous performance.

Is there a way to know where the files are located, so I can verify that this is in fact the source of the problem? Also, knowing where the files are located can help me figure out where the files started getting interspersed, to know which files to delete and re-write -- and also to verify that the new files are getting written sequentially.


UPDATE: I am definitely leaning toward the fragmentation explanation. Going on, the compression CPU usage dropped down to 1-2%. On the other hand, I tried compressing one of the previous folders, and it completed at the previous speed. If no better suggestions come up, then I plan just to move the files to another HDD before compressing them, and then decompressing them on my internal HDD.


UPDATE 2: I noticed an anti-virus script checking the millions of files on the attached USB, running in the background at the nice 20 level. I terminated the script, and the compression CPU utilization went back up to the original level. Apparently even though the compression was running at nice -20, it was still getting less priority than the anti-virus script, when it came to system calls. What's really surprising is, that the anti-virus script was portable, running entirely off of the USB, not even touching the internal HDD -- I am surprised it would have such an effect. In fact, the situation is not unique to the anti-virus script -- anything that does I/O to a lot of files, such as decompressing the files on the USB drive, causes the same thing to happen.

After stopping the anti-virus, and monitoring the performance, I decided to re-start it to see what happens. The first file it scans is rather large; even while scanning that file alone, the compression's CPU usage dropped immediately to 0.2%. Over time, the anti-virus's real memory usage reached 330MB, whereas the compression's real memory usage dropped by about 750MB. Other than that, no differences are obvious from the top command.

So now, it is a replicable phenomenon; if someone offers an explanation, I can test their idea and accept it as an answer.

Alex
  • 250

0 Answers0