Last year I got a new external HDD (drive 1) which failed without warning in less than 2 months. I got it replaced with a different but also new external drive (drive 2), and now four months later it has also started to fail. I have a secondary external drive (drive 3), this one several years old, which had been humming along fine... until now. It just started to pop paging operation errors.
I don't plug and unplug these drives at all and rarely move them, if once a month that's too much. Never touch them when they're spinning. They're lying flat quietly in their corner.
All three drives were connected to the same motherboard through USB. I don't overclock anything, the computer is plugged to a basic power strip with a fuse which is plugged into the wall socket. I also have two SSDs plugged into the motherboard the old way, sata+power from PSU. These seem ok.
There was a power issue in my region in the beginning of the year, but it's been sorted out and the first drive died before that. The PSU is also old, a Corsair CX430. I'm not experiencing any other classic PSU failure signs, only dying HDDs.
I've been reading about suspicious drive failures and most people point to faulty PSUs, but it's always about internal HDDs, not portable ones. Wouldn't I also see other issues if it were the PSU?
Maybe I'm just unlucky and got two bad drives in a row and the third is just nearing the end of its life, but if it's not lack of luck I need to figure out what's happening because it's been making work a nightmare. I need a good plan of action here to diagnose the issue but I'm a bit lost. I don't know which tests I could perform on the PSU to rule it out in this case nor what sorts of testing I could do on the mobo, this if it can indeed kill external drives.
Edited to add extra information
I'll be identifying the drives as Drive 1: new, first to fail, no longer with me; Drive 2: new, failing; Drive 3: old, possibly failing.
All drives are or were connected to USB 3.0 ports in my mobo. Drives 1 and 2 were connected to the same port, but I tried all others after the issues. Didn't closely investigate the ports yet. Drive 3 is connected to a different hub or cluster of mobo ports. They're far apart and my ignorant guess is that these are different circuits:

I have peripherals plugged into the remaining USBs. Didn't notice any anomalies.
CrystalDisk readings with SMART data below. They're both consumer grade portable HDDs with 2.5" insides.
Drive 2 (new external hard drive, currently failing)
Chkdsk /r detected no bad blocks. It can be read and written to and will show no errors in the Event Viewer if left alone, but during normal usage, writing larger 100MB+ files it'll start to cause these event warnings:
ID 51, warning: "An error was detected on device (DISK) during a paging operation." (preceded the first write failure it ever had, sure sign it's about to fail to write again)ID 153, warning: "The IO operation at logical block address 0x------ was retried" (started after the first failure, the logical block changes, sometimes it's 0x0)
At this point if you insist on writing, they'll be followed by:
ID 140, warning: "The system failed to flush data to the transaction log. (...) Failure status: {Drive Not Ready}" (on the first failure)ID 154, error: "The IO operation at logical block address 0x------ failed due hardware error"ID 137, error: "The default transaction resource manager on volume D: encountered a non-retryable error and could not start. The data contains the error code."ID 140, warning: "The system failed to flush data to the transaction log. (...) Failure status: The request failed due to a fatal device hardware error."
The 2 first warnings are not obvious during usage, but once the errors happen it'll freeze in the OS until I unplug and plug it again. The most damning sign is that when it fails to write it clicks. It's not clicking non-stop, but a click is one too many, right?
Drive 3 (old external drive, currently stuttering)
Had to run chkdsk twice, the scan got stuck the first try. Also reports 0 bad sectors.
It'll also prompt silent warnings. Neither is evident during operation, no stutters happen:
ID 153, warning: "The IO operation at logical block address 0x------ was retried" (Has been going on for a while, the logical block changes, sometimes it's 0x0)ID 51, warning: "An error was detected on device (DISK) during a paging operation." (started yesterday, also predicts imminent fail)
Starting yesterday it'll randomly cause software writing large files (1GB+) to it to halt and show errors. At that point it'll disconnect and reconnect itself, being accessible afterwards. Event viewer shows the following errors in these moments:
ID 50, warning: "{Delayed Write Failed} } Windows was unable to save all the data for the file D:(something)" (changes, sometimes it's just the volume root)ID 140, warning: "Failure status: A device which does not exist was specified. (...) Failure status: A device which does not exist was specified"
I have yet to hear a click from it.
Drive 1 (first new external drive to fail, no longer with me)
It also showed paging errors (51) which went unnoticed for a couple of days, followed by these when the drive failed to be written to:
ID 7, warning: "The device, (DISK) has a bad block."ID 154, error: "The IO operation at logical block address 0x------ failed due hardware error"
It was sudden and fatal, freezing and never becoming accessible after that; I tried to at least recover some files but it wouldn't even appear in DISKPART or linux. It also presented the constant click of death after that event.
Edit 2: It only happens under higher motherboard temperatures
I did the suggested tests by transferring 5GB files between disks in a variety of conditions ranging from same PC and OS to different PC and OS. When I couldn't reproduce the issues not only in different conditions but also on the original PC and OS, I realized one factor changed in the days disks started to act up and when I ran the test: Weather.
Temperature dropped over 10ºC when it went from a heatwave to unusually mild weather. Today it's a bit warmer I could reliably reproduce the issue on the same PC and OS. I also had a help from an erratic fan (details at the end).
Three temperatures climbed when the disk 2 showed the usual i/o error due hardware failure and froze:
The M.2 SSD containing the OS went to 79ºC. It's on the underside of the board, directly below the PCH.
The PCH reported 59ºC+.
A "Temperature 5" sensor reported 69ºC+. I'm guessing this is the VRM, no other mystery sensor gets so hot.
Bringing the SSD temperature down didn't stop the errors, but bringing the PCH/temp 5 temperatures did. At these values disk 2 was back to working fine:
I knew neither the CPU nor the GPU were exceedingly hot, but I didn't pay attention to the mobo and certainly not to the SSD temperature. According to what I've been reading these mobo temperature readings aren't that hot, but thanks to the fan these are hotter than they'd usually get in my system (~50/60ºC).
I'm working on figuring out the exact temperature cutoff, so far it's 59ºC PCH and 56ºC temp 5 the lowest for the disk to stop responding. Among other things the PCH manages the USB data and power, doesn't it?
The fan issue
My case is a small factor with a frontal 120mm fan. I replaced the original with a watecooler heatsink/fan doubling as exhaust for the computer. It's plugged on the CPU_FAN and fulfills its double duty well. It's as old as everything else in this build, and I adjusted the stock curve to a slightly more aggressive one.
When running the tests I realized something was wrong with the fan: it was stuck at the minimum speed when it shouldn't. It ignored the curve set by the windows mobo utility, and attempts at making it spin at fixed 100% made it randomly spin at 100% for a couple of seconds without sustaining the rotation speed as expected. It did respect the UEFI settings though and started to respond to the utility again after I changed settings through the UEFI.
Odd, but I don't think it's the root cause of the issue, it just exacerbated it by letting temperatures climb quicker and higher. I'm concerned such temperatures degraded some component in the long run because in hindsight my computer has been too quiet since the beginning of the year, and we went through 4 or 5 heatwaves by now.



