What does “Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK” in dmesg mean?

Question

When I power-off an external USB 3.0 SSD (hard drive enclosing containing a 2,5-inch Samsung SSD, attached via a USB 3.0 cable to a USB-A 2.0 or 3.0 computer port; the partitions have been already unmounted) via

$ sudo udisksctl power-off -b /dev/sdg

I get the messages

[ 8618.812659] sd 8:0:0:0: [sdg] Synchronizing SCSI cache
[ 8619.120991] sd 8:0:0:0: [sdg] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[ 8619.295465] usb 1-8: USB disconnect, device number 13

in the sudo dmesg output. Both read and write caches were enabled according to prior dmesg output. To me, “failed” doesn't sound good in general. In the best case, it's a manifestation of a bogus program architecture or bogus programming (a read/write cache flush is attempted when it's no longer necessary or possible, or even a misleading printf issued), and in the worst case, it results in potential data loss or corruption (a write cache contains data to be flushed, but this data never makes it to the drive). What happens under the hood? Is this something I have to worry about?

The above messages don't appear with the USB pen drive https://www.amazon.de/dp/B07RT8KG8N in lieu of the above SSD; this is consistent with no caches enabled for the USB pen drive.

Tom Yan · Answer 1 · 2022-07-23T09:21:36.203

https://github.com/torvalds/linux/blob/v5.18/drivers/scsi/sd.c#L3530

And sd_shutdown() in turn calls sd_sync_cache().

My guess is certain existing structure makes it inconvenient to have sd_shutdown() behave differently when called from sd_remove(), and since it doesn't pose any actual problem anyway, no one cares enough make it "better" (not to mention that making such "pointless" improvement could introduce regression if carelessly done).

But I could be wrong (as in, there could be good reason to call sd_sync_cache() anyway). Either way this kind of question belongs to the linux-scsi mailing list. Let alone whether it is out of the scope of this site, the devs there can give you much better answer on the why. Here you tend to get irrelevant answers. (Or worse, "your disk might be dying" / "check SMART" FUD.)

By the way I think the real concern here is, does unmounting / sync (as opposed to sg_sync from sg3_utils) guarantees that the on-device writeback cache is flushed (instead of just on-memory dirty pages)? I don't think Linux could bear such a flaw, but I do run sg_sync after unmounting sometimes (because I'm a paranoia). (EDIT: See https://github.com/torvalds/linux/blob/v5.18/drivers/scsi/sd.c#L1249)

In case it's not obvious enough, whether you see the message does not depend on whether the cache is empty. AFAIK there is not even a way in SCSI or ATA to check that. Also udisksctl is NOT relevant here, you'll see the message even if you just disconnect the drive.

For the record, the fact that sd_shutdown() is called in sd_removed() (and that sd_sync_cache() is called in sd_shutdown()) dates back to earliest git commit of the upstream kernel. Back then sd_shutdown() does nothing more than sd_sync_cache(). And according to the comment for sd_remove(), it seems that at least one reason to call sd_sync_cache() in sd_remove() is that sd_remove() is not only called when (after) a disk is detached, but also when the scsi disk module is unloaded, in which case SYNCHRONIZE_CACHE will NOT fail. But from my POV, whether SYNCHRONIZE_CACHE could ever be really necessary when sd mod is unloaded is still in question, as I'm not sure if that's even possible when there is a mount that involves a filesystem on a SCSI disk.

But again, AFAICT, if we want to "suppress" SYNCHRONIZE_CACHE from being sent in the case of sd_remove(), we'd need to e.g. set sdkp->WCE to 0 in it, which might be, semantically speaking, sort of dirty.

It's also hard to say whether the other stuff that the current sd_shutdown() does is ever necessary in the case of sd_remove() either. (I mean, the reason that it is called might be completely historical.) But again again, I doubt that any devs would bother to risk a regression when everything has been fine and harmless.

pbies · Answer 2 · 2022-07-19T00:10:56.967

Manual page for udisksctl = issue command man udisksctl says:

power-off
           Arranges for the drive to be safely removed and powered off. On the OS side this includes ensuring that no process is
           using the drive, then requesting that in-flight buffers and caches are committed to stable storage. The exact steps
           for powering off the drive depends on the drive itself and the interconnect used. For drives connected through USB,
           the effect is that the USB device will be deconfigured followed by disabling the upstream hub port it is connected
           to.
       Note that as some physical devices contain multiple drives (for example 4-in-1 flash card reader USB devices)
       powering off one drive may affect other drives. As such there are not a lot of guarantees associated with performing
       this action. Usually the effect is that the drive disappears as if it was unplugged.

So it seems that udisksctl should make safe drive removal:

write data from caches
unmount the drive
turn it off

Error does not tell much, but if you want to be sure that everything is ok, you should issue a command before: sync and wait till this command will write everything to the drive. If sync will return without error message to the next command prompt - you can then issue your command and after that - remove the drive.

It is possible that if the caches are not written to the drive - there could be data loss. Then end of file(s) or file table (MFT for NTFS or inode table for ext4) could not be updated as it should be.

In case you suspect data loss, you can verify that by comparing recently written files to that drive with original copy (source, even right after copy process).

EDIT:

From the error message it seems that your disk is dying. Make a full backup and check SMART info for that drive.

What does “Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK” in dmesg mean?

2 Answers2