5

I have had in the recent weeks several cases of data corruption that occurred while copying files from one disk to another. Question is: what can be the cause, and how do I pinpoint it?

Some clues:

  • problems (9 cases) occurred on two different machines (one AMD 5050e with ECC RAM, the other some netbook), both running Win7x64 SP1 with no crash or other apparent problem;
  • problems occurred while copying large amount of data (total about 3TB) from one disk to another;
  • copy was with the standard GUI (Windows Explorer) that reported no error;
  • original file and copy have the same size and modification date;
  • data corruption was detected using MD5 hash (md5sum or/and Microsoft's FCIV), which was wrong on the copy (the MD5 of both original and copy is repeatable);
  • fc /B repeatably reports the differences, that always have been on contiguous blocks with exactly 4 kiB boundary (10 cases: one file was hit twice);
  • blocks in error are of varying size, from 4 kiB to 52 kiB, at seemingly random location, in large files (typically some GB);
  • corrupted blocks show no apparent relation to the original; in about half the cases, the corrupted data was all-zero;
  • all disks involved are NTFS, and given a clean bill of health by chkdsk /f (no bad block, no error reported);
  • the two affected destination disks are USB (the HD happens to be from the same manufacturer, but I can't say this is significant)
    • one is a 2.5" 2 TB housed in a self-powered USB 3 (Super-Speed, used in Hi-Speed) enclosure bearing the HD manufacturer's brand;
    • one is a 3.5" 1.5 TB in a Linux-based multimedia enclosure (PCH A-200) with USB 2 (Hi-Speed) slave port;
  • in more than half the cases the corruption was detected like an hour after the copy, with no disconnection or reboot involved; in most or all others the destination disks have been properly ejected;
  • I have no reason to suspect the various source disks (mostly SATA, some SSD).

Addition: I'm really concerned by finding the root cause and pinpointing the culprit(s), more than by working around the issue.

I reason that all the technologies involved are supposed to have a very low rate of undetected errors compared to reported ones (and I have no report of error). Therefore

  • if the error trigger was the magnetic media (an hypothesis that very well matches the observed 4kiB alignment, which I believe matches the internal physical sector size of the disks), it doubles with a disastrous bug somewhere preventing the error to be reported, as it would be (I know from experience) on a at least a read error in a SATA disk of my (different) favorite brand;
  • if the error trigger was poor electrical contact of USB cabling and undetected by CRC (as suggested by an answer); and given that the USB 2 maximum data packet size is 1kiB according to this source, not 4kiB as the alignment of all my errors; there must be some additional bug in the handling of errors (or a gaping hole in the USB specs or how they handle hard disks).
fgrieu
  • 856

2 Answers2

2

I have got exactly the same problem, and could iudentify that this is related to USB 3. I got the problem on two different disks when using USB 3. When using eSata connection (both disks have USB 3 + eSata), I have no problem. I am using Windows 7. I got the problem with 2 different antivirus (McAffee and Essentials). Now I avoid using USB 3 port on my laptop. Because the 2 disks come from same manufacturer, I have the same USB 3 cable and could not test with another cable. But I would be surprised if the USB cables were not good.

LeJav
  • 21
1

differences, that always have been on contiguous blocks with exactly 4 kiB boundary… blocks in error are of varying size, from 4 kiB to 52 kiB, at seemingly random location, in large files

Because drives write in chunks, you will usually see exactly the sort of block-sized errors you see with connection errors as opposed to drive errors.

I was going to ask if the drive is a flash-disk (which I have had the misfortune to experience silent, undetected corruption with), but then I saw this:

the two affected destinations disks are USB

This is another source of corruption I have unfortunately experienced. The problem is that a USB drive goes through a cable, so there can be problems with the electrical connection which can lead to corruption. Whenever you have this sort of problem, the first thing to do is to clean the pins on the connectors of the drive, the cable, and the USB port. You can try first breathing on it with some moist air from your lungs because the humidity helps conductivity. If that seems to have any effect, then you can brush the pins with a toothbrush or something (I usually use an emery board to lightly scrub them).

Synetech
  • 69,547