1

I've been a musician composer for over 15 years, and I have to manage a growing body of works, in .wav format, and I would like to be able to find the best way to archive them while still being able to access them easily. In terms of space it will take a few TB, on this side it is not a problem. The difficulty is that I want to make sure that my files won't move one iota in, say, 100 years (until I die). But I fear that my wav files, which are raw files, binary stream, are sensitive to bit flipping, and that it is always possible to read them without knowing if the file could have been altered.

What I'm considering is to buy a Synology brand NAS, and install ECC RAM on it, as it's possible on some models. Do you think this is useful or am I completely wrong about how data is handled at the disk level?

I obviously have several copies of the backups, following the golden rule of three backups including one remote.

My questions are:

  • Does the ECC memory provide an advantage for HDD disk storage left on for a very long term?

  • Does the bit-by-bit comparison of two files give 100% certainty that two files are identical?

Mokubai
  • 95,412

2 Answers2

2

ECC RAM prevents single bit errors in RAM, it might be relevant for systems with long uptime where data is held in RAM for long periods, or situations where an occasional single bit error may cause catastrophic errors, incorrect calculation, or other undesirable effects.

It does not protect the disk in any way except after the data is read from disk to RAM or written from RAM to disk.

A normal NAS is likely to write data quickly after receiving it and as the reason for reading data is to quickly send it over the network it is unlikely to be held in RAM for long durations either. A NAS used for lots of reading or writing (such as a media server) is not likely to store a large amount of data in RAM for long durations.

For long term storage on disk ECC RAM is irrelevant, it only protects the data in RAM.

If you must ensure that data on disk is protected and valid then you may want to look at some kind of forward error correction stored along with the files such as Parchive that can store a configurable amount of extra data that can be used to detect, recover and repair errors that may occur. This will use up space beyond that of the files themselves.

Mokubai
  • 95,412
2

Rather than concentrating on RAM it makes more sense to pick a NAS that supports a 'check-summing' file system like ZFS and that does / can be configured for periodic 'scrubbing'.

However, added value of ECC RAM would be prevention of data corruption for the duration it 'sits' in memory.

FWIW one of my jobs is digital photo repair like this https://youtu.be/woTf_-kJnUs.

enter image description here

Now, it's common for people assume their storage media is underlying cause for this type of corruption. However, data on a hard drive or SSD or memory card is protected by ECC. This implies that if bits flip after they were written to whatever drive, this would be detected. The drive will either correct the error using ECC or if this isn't possible due too many bit errors, it would report a read error.

IOW data corruption as demonstrated in the video is almost by definition not the result of bitflips while the data 'sits' on the drive, it must have happened at some other point. RAM is a possibility, though some bad connection more likely in my experience.

This so called 'silent corruption' in my experience almost never happens while data sits in rest on some storage device. It almost aways happens while it's being transferred from one place to another.

Now this can be happening inside a storage device itself, like an SSD. As SSD's often scrub data or move data around for wear-leveling purposes, data is briefly placed in volatile memory after being read and to be written somewhere else. Intel engineers assume for example cosmic rays as a rare but realistic cause for bitflips during these processes. Real enough to test the effect on SSD's and real enough to make firmware aware of the issue (see: https://youtu.be/fqzv2YXMFRs).

... So cosmic rays sure you've heard about it ... and that it can cause bit flips you probably heard less about it in the in the context of SSDs and what I'm hoping to talk about in the next 30-40 minutes is how much this actually matters to SSDs and in fact it's probably the dominant source of silent errors in SSD's or at least one of the dominant sources of silent errors so what what I wanted to do is talk about ...