4

Using Linux, I am copying with rsync 60Gb of files cut in very small files (1MB each) I thought copying would be limited by the maximum writing speed of the destination hard drive but it seems my whole system gets very slow (e.g. unlocking takes about 5 minutes) The source is an external hard drive and so is the destination, both plugged through USB3.0.

Monitoring my system with 'htop' shows that the CPU is barely used and the memory is mostly available. I will try copying files with 'cp' but I doubt there would be any difference. What is causing this performance issue? Why is copying between two external hard drives causing performance issues on my internal system?

I don't think this is fixable but I would like to understand.

2 Answers2

4

There is a significant procedural difference between copying a 60gb file to a drive, and copying 600 10mb files to a drive..

file transfer processes include a 'handshake' and verification process for every packet transferred, generally occurring after a file has completed, so the complete file is verified. This process adds additional time to the transfer of a file.

If you're transferring ONE file, it's going to happen once. Or with 600 files....well, 600 times.

additionally, if you're running over USB2, that pipeline is quite likely the culprit - - USB 2 is ony 480mbit/second, which is rather slow and tedious, and due to the 60,000+ times your drive is going to be copying/handshaking/verifying, it's going to take a. very. long. time...

if you are wanting to do a backup of data like this, a better way to do it is to do a tgzip or compression of the files to fewer larger files, then copy these over.. however, if you do that, don't think that you'll save time, if you're planning to 'unzip' them on the other end!

The real difference between internal and external is that your internal drive is running a 'pipeline' that's ~45x's LARGER & FASTER than your external USB drive. It is a huge difference...

This makes a tremendous difference when it comes to duplicating numerous files such as your description: You're able to copy & verify hundreds of files at a time, whereas an external USB2 port will do only a couple at a time.

A simple analogy of this would be about the process of filling a gallon pail of water. Your external USB2 port would be the equivalent of a drinking straw....it'll take a while, and you'll have to stop and take breathes while getting it going... Your internal drive would be the equivalent of using a garden hose.. It'll be done in only a few seconds....

IF your system is duplicating OFF the internal drive to the external, quite likely your system isn't going to let the internal drive be 'free' for other activity, effectively 'locking up' the system, and leading you to think the system is frozen up during this time...

3

This happens because of the limitations of hard disk drives. You can have a great processor, fast ram, amazing motherboard etc., but all the data it processes and loads exists on the hard drive. When you copy lots of small files, the hard drive has to write additional information for each file. The file types, beginning locations and end locations as well as other data. When you write a single large file (.zip, .rar, .7z, .gz) it doesn't have to write all that other data on the drive, as the compression protocol handles those things later.

You're using up your drive's read/write. It's writing to much in such small increments that it doesn't have the space to read the data to unlock your OS, or to open programs.

I'm not familiar with disk usage analyzers on Linux, but if you could find one and post your findings, that would be a great help.