0

With the space to cost ratio of hard drives leading to capacities that present an increasing challenge to parity Raid systems (URE probability, multi-day rebuilds stressing drives and risking secondary failure) is there a case to be made that they also present an opportunity/recommendation for multi-disk (>2) Raid 1 configurations for use at home and in small businesses? That is, Raid1 'wastes' space compared to a parity scheme, but with cheap mutli-terabyte drives, is space not arguably the commodity we have to waste compared to the memory, bandwidth, cpu clock, downtime etc required in striping, rebuilds etc.

If I understand the technology correctly, a Raid 1 (>2) configuration:

  • ..cannot fail a rebuild, in so much as long as one drive is accessible so are the files.
  • ..provides greater security relative to the drive count. It's basically an inversion of the ratio right? Raid5 on three disks provides 66% usable capacity and can handle 33% (1 drive) failure. Raid1 provides only 33% capacity but 66% failure (2 drives).
  • ..has an easier and more robust recovery mechanism. In dire conditions, the Raid need not be rebuilt to access the data. Drives can be migrated between different systems.

That last point feels akin to the unRaid philosophy where files are saved across drives, and not striped, and the second point suggests to me something more attractive to a small business; prioritizing data security over data capacity. What's the point of being able to store more data if it's at greater risk?

Questions:

  1. Would a strategy of 'manually-mirroring' the content, as opposed to Raid1 copying-at-time-of-writing, avoid potential problems with non-ECC memory corruption? Since the job of writing to each disk would be a distinct process the same memory fault that occurs in one write would not theoretically occur in another ?

  2. Is there a Raid1 software/hardware designed to work with more than two drives, and can therefore handle traditionally un-raid1-like behaviours such as bit-rot, URE etc? With a traditional disk mirror setup if disk A reports a 0 and B reports a 1, you have no idea who flipped, but in a three or more disk solution could you not 'correct' the odd man out disk? A=0, B=1, C=0 thus B must have flipped?

  3. At the same time, would such a Raid1 solution support parallel reads from more than two disks to provide accelerated read speeds? This is especially important since I imagine there are many businesses who read the same data many more times than they edit it.

  4. Raid is not a backup solution. So if you were to follow good practice and have between 2-3 storage locations, is it intended that only one of them is protected from drive failure via a raid scheme or would you expect a second offsite location to also have multidisk redundancy? Is it the case that even a single drive, stored in a different location is thought to be protected from device failure because it's in an 'effective' mirror relationship with the drives in the other locations? If not, then does Raid1 not offer some redudancy for the lowest minimum drive count (2)?

Update:

Re point 1, as some have pointed out, this does rely on a data source free of errors. This is however the case for pretty much any storage/backup strategy. We may use sophisticated systems for data integrity in our NAS/SAN solutions but the data we store in them is typically being generated by work stations and devices without such measures. Most production PC's are built for either cost or speed. Highly unlikely, especially in a small business, that the computer you do your CAD or finances or powerpoint on etc uses ZFS formatted drives and ECC memory.

In my specific example, one of the things I will be looking to store will be the output from a camera. I have to assume the photos and videos it saves on to the SD card are 'correct', and there's not much I can do if the corruption occurs at the point in the data chain.

Suggestion:

If this strategy is not supported at the RAID level are there any software packages that can be used to manually perform some of the activities I'm talking about? Writing copies via queued, discrete rysnc tasks? A bash script could perform a periodic bit-rot scrub? Just checksum all copies of a particular file across all disks, then overwrite the copy on a disk with the wrong checksum using the copy on a correct disk?

John S.
  • 113

2 Answers2

0

To answer the questions

  1. Manually mirroring the content would bypass some potential problems related to ECC, but could introduce others. It also assumes the source data has not been corrupted by an earlier failure.

  2. This depends on the hardware / software. I'd imagine most RAID1 implementations would not pick up bitrot, but would handle URE. Reads of a piece of data are generally only done from 1 of the drives, not all - which would allow for faster reads overall.

  3. If RAID1 does support all 3 disks, yes, parallell reads would be faster with more disks. AFAIK this is not supported in mdadm (ie Linux software raid based solutions) as the 3rd drive is treated as a hot spare.

  4. Whether you RAID your offsite location is a question of robustness and reliability. It would be a best practice, but not an absolute necessity. In my mind hard disks are consumable parts - having RAID allows for replacement of failures without having rebuilds AND increases durability. This would really be a cost-benefit discussion.

davidgo
  • 73,366
0

Lets start with the assumptions:

If I understand the technology correctly, a Raid 1 (>2) configuration:

..cannot fail a rebuild, in so much as long as one drive is accessible so are the files.

This is true, but it does not protect from lightning strikes, theft, flooding etc. Thus while you reduce one risk you still need off-site backups.

..provides greater security relative to the drive count. It's basically an inversion of the ratio right? RAID5 on three disks provides 66% usable capacity and can handle 33% (1 drive) failure. Raid1 provides only 33% capacity but 66% failure (2 drives).

That assumes that drive failure are independant. On SAS this might be the case (unless the drive fails spectacular and also damages other drives), but it is not the case for PATA or SATA. Usualy a hung disk means all drives on that controller will hang. You will still have your data but you also would have your downtime.

..has an easier and more robust recovery mechanism. In dire conditions, the RAID need not be rebuilt to access the data.

Most of the time the rebuild is not a problem. Nobody rebuilds RAID arrays. With terabyte disks it is faster to replace the disk, recreate the arrays with your data and restore from backup.

From a business perpective the mean goal from RAID is to keep to the system running till 17:00, then make sure that your daily backup works, followed by a new disk, a fresh RAID array and a restore from backup.

Drives can be migrated between different systems

This depends a lot of the RAID implementation. It may work. It may not.


Would a strategy of 'manually-mirroring' the content, as opposed to RAID1 copying-at-time-of-writing, avoid potential problems with non-ECC memory corruption? Since the job of writing to each disk would be a distinct process the same memory fault that occurs in one write would not theoretically occur in another ?

If you are lucky. But what would stop you from successfully and flwlessly copying a corructed file?

Is there a RAID1 software/hardware designed to work with more than two drives, and can therefore handle traditionally un-raid1-like behaviours such as bit-rot, URE etc? With a traditional disk mirror setup if disk A reports a 0 and B reports a 1, you have no idea who flipped, but in a three or more disk solution could you not 'correct' the odd man out disk? A=0, B=1, C=0 thus B must have flipped?

There is theis 'RAID1-flavour' called RAID5....

Seriously though, RAID3,4,5, and RAID6 come to mind. No need to force a mirror into something else.

At the same time, would such a RAID1 solution support parallel reads from more than two disks to provide accelerated read speeds? This is especially important since I imagine there are many businesses who read the same data many more times than they edit it.

RAID 5 would. And RAID 6, and ...

RAID is not a backup solution. So if you were to follow good practice and have between 2-3 storage locations, is it intended that only one of them is protected from drive failure via a RAID scheme or would you expect a second offsite location to also have multidisk redundancy?

I would want a off-line backup. One not accesible via the Internet and in a physical different location. A second backup which is reachable and updated daily is also nice. But mostly in case of fire and similar.

Mothly backups to really safe off-line and daily to tape/disk/cloud would be a nice addition to that.

Is it the case that even a single drive, stored in a different location is thought to be protected from device failure because it's in an 'effective' mirror relationship with the drives in the other locations?

RAID is not backup. A part of a RAID is not a proper backup. Not even a disk from a multidisk mirror. Depending on your RAID implementation it might work or it might not. A proper backup always works. Even after updating software (e.g. RAID drivers).

If not, then does RAID1 not offer some redudancy for the lowest minimum drive count (2)?

It offers redudancy to a non reable sector or broken disk.

And that is all that is typically needed. Enough redundancy to keep things up and running until you can do emergency maintenance.

Hennes
  • 65,804
  • 7
  • 115
  • 169