4

I have all of my files on an external SSD and am also backing up the entire drive to an online service.

What will happen when my SSD starts degrading eventually? Will that result in corrupted files, so then the backup service will see that the file “changed” and upload the new corrupted copy as well? Then my online backup copies will be corrupt and I’ll lose the original completely?

If it matters, it’s MacOS and the service is Backblaze.

backups
  • 41

2 Answers2

4

What will happen when my SSD starts degrading eventually?

Usually the drive will fail or it will show clear signs of impeding failure. It should not be the case that data contained on the drive slowly degrades over time, unnoticed and leading up to failure.

Corruption should not happen ..

Data is protected by error correcting codes, and data degradation or 'grown errors' would result in detection of data corruption, and if possible silent correction of data and if correction is not possible, would result in a read error. IOW, you should either get correct data or some media error. So, not silent and a backup tool would detect a problem.

It is true that NAND based drives such as SSDs slowly 'bleed data' but that typically can be detected and corrected using error correcting codes and NAND READ-RETRY (where the drive experiments with reference voltage levels). The 'bleed rate' can be influenced by wear and age of the NAND. But again failure to retrieve the data should result in a media error of some kind. So, not silent and detectable by backup software.

But it could .. But it's not old age

enter image description here

Then the only thing you'd have to worry about is silent data corruption, but this is not necessarily associated with a drive's age or degradation due to age.

Silent corruption or what people tend to call bit flips or bit rot are typically caused by either corrupt data being written along with ECC code that matches the corrupt data, or data that corrupts after it was read and verified by ECC. And unlike 'grown' errors they go unnoticed and such errors would end up being backed up.

This type of error is typically caused by for example bad memory or according to Intel cosmic rays.

Unlike on conventional CMR drives, SSDs are far more dynamic. From the outside they appear static as LBA addresses can be mapped to any physical address. To counter effects of retention errors, various types of disturb errors and for purpose of static wear-leveling, drives actually shuffle data around. It's during this shuffling where Intel suggests we run a high risk of introducing silent errors.

This an example of silent corruption, I was asked to repair photos but I could not. I asked for the original drive itself rather than the corrupt files. The drive did not report any errors but we see obvious bit corruption that was by the way repeated throughout the entire file and over multiple files:

enter image description here

Any backup tool would happily backup such a file.

Now in this case we're not dealing with some cosmic ray induced corruption because corruption is too systemic. The exact same issue was observable in multiple files. If we look at the obvious and easy to spot corruption we observe what appears to be a 'stuck bit':

Canon - Cankn
01101111 - o changed into
01101011 - k

And ..

EOS - EKS
01001111 - O changed into
01001011 - K

Each time the 6th bit flips from 1 to 0, or even better said the bit sticks at 0. If I'd have to guess we're observing silent corruption introduced at some hardware level, it may even be limited to one specific NAND chip.

2

SSD degradation should not normally lead to corrupt files. Corruption occurs when writing, not reading, so it is easy enough for the firmware to catch this, mark the actual block bad and remap it.

There are a few other things to mention:

  1. A good backup system should version backups. I've not used Backup Blaze, but it does look like Backup Blaze supports at least some versioning - https://www.backblaze.com/computer-backup/docs/version-history

  2. Bit Rot can hit SSDs - but it can also hit the memory before the content is written to disk [i.e. in memory]. Bitrot is luckily very rare, but if it does happen on disk, the bad copy would not be backed up unless you opened it to edit it, as there is nothing to indicate the bits have flipped (i.e. the creation date and other permissions and meta data relies unchanged). ZFS periodically scrubs disks - i.e. checks against RAID for bitrot. If it's any comfort, in the many years I've had a ZFS NAS, it has never reported any bit rot.

  3. You may want to use S.M.A.R.T monitoring/checking to keep an eye on the state of the drive.

  4. You may be technically right by "permanent retention being impossible" - everything eventually dies - however you are generally talking about months to years without powering on - and even then that effect will be gradual.

  5. Generally when an SSD goes, it goes suddenly and catastrophically.

davidgo
  • 73,366