Shrink RAID by removing a disk?

Question

I have a Synology NAS with 12 bays. Initially, we decided to allocate all 12 disks for a single RAID-6 volume, but now we would like to shrink the volume to use only 10 disks and assign two HDDs as spares.

The Volume Manager Wizard can easily expand the volume by adding hard disks, but I have found no way to shrink the volume by removing hard disks. How can I do that without having to reinitialize the whole system?

score 26 · Accepted Answer · edited Mar 24 '18 at 15:40

For this I am going to assume there are 12 disks in the array, and each are 1TB big.

That means there is 10TB of storage. This is for example, provided you are not using more than 6 disks (6TB) worth of storage, then it doesn't matter what size they are.

Oblig disclaimer: None of this may be supported by Synology, so I would check with them if this approach can cause problems, backup beforehand, and shutdown any synology services beforehand. Synology use standard md raid arrays as far as I know, and they are accessible if the disk are moved to a standard server that supports md - so there should be no problems.

Overview

The sequence goes like this:

Reduce the filesystem size
Reduce the logical volume size
Reduce the array size
Resize the file system back
Convert the spare disks into hot spares

File system

Find the main partition, using df -h, it should look something like:

Filesystem                Size      Used Available Use% Mounted on
/dev/vg1/volume_1         10T       5T   5T         50% /volume1

Use this command to resize to the maximum it needs and no more:

umount /dev/vg1/volume_1
resize2fs -M /dev/vg1/volume_1

Now check:

mount /dev/vg1/volume_1 /volume1
df -h

Filesystem                Size      Used Available Use% Mounted on
/dev/vg1/volume_1         5T       5T    0T        100% /volume1

Volume

To reduce the volume size, use lvreduce (make it a bit bigger just in case):

umount /dev/vg1/volume_1
lvreduce -L 5.2T /dev/vg1/volume_1

Now that the logical volume has been reduced, use pvresize to reduce the physical volume size:

pvresize --setphysicalvolumesize 5.3T /dev/md0

If the resize fails, see this other question for moving the portions of data that were allocated at the end of the physical volume towards the beginning.

Now we have a 5.3T volume on a 10T array, so we can safely reduce the array size by 2T.

Array

Find out the md device:

 pvdisplay -C
 PV         VG      Fmt  Attr PSize   PFree
 /dev/md0   vg1     lvm2 a--  5.3t    0.1t

The first step is to tell mdadm to reduce the array size (with grow):

mdadm --grow -n10 /dev/md0
mdadm: this change will reduce the size of the array.
       use --grow --array-size first to truncate array.
       e.g. mdadm --grow /dev/md0 --array-size 9683819520

This is saying that in order to fit the current array onto 10 disks, we need to reduce the array size.

 mdadm --grow /dev/md0 --array-size 9683819520

Now it is smaller, we can reduce the number of disks:

 mdadm --grow -n10 /dev/md0 --backup-file /root/mdadm.md0.backup

This will take a loong time, and can be monitored here:

 cat /proc/mdstat

Personalities : [raid6] [raid5] [raid4]
md4 : active raid6 sda4[0] sdb4[1] sdc4[2] sdd4[3] sde4[4] sdf4[5] sdg4[6] sdh4[7] sdi4[1] sdj4[1] 
      [>....................]  reshape =  1.8% (9186496/484190976)
                              finish=821.3min speed=9638K/sec [UUUUUUUUUU__]

But we don't need to wait.

Resize the PV, LV and filesystem to maximum:

pvresize /dev/md0
lvextend -l 100%FREE /dev/vg1/volume_1
e2fsck -f /dev/vg1/volume_1
resize2fs /dev/vg1/volume_1

Set spare disks as spares

Nothing to do here, any spare disks in an array are automatically spares. Once your reshaping is complete, check the status:

cat /proc/mdstat

Personalities : [raid6] [raid5] [raid4]
md4 : active raid6 sda4[0] sdb4[1] sdc4[2] sdd4[3] sde4[4] sdf4[5] sdg4[6] sdh4[7] sdi4[S] sdj4[S]

score 1 · Answer 2 · answered Oct 27 '24 at 02:52

Expanding on @Paul's excellent answer - it's 10 years later now and while the basic principle still works, some things have changed. Here's the issues I ran into and how I dealt with them.

First, I was unable to unmount /volume1 because some processes on the system had open file handles. In my case those were synologan and postgres, which I was able to stop with:

systemctl stop synologand
systemctl stop pgsql

If that doesn't do the job for you, you'll have to build lsof from source, which can be quite a pain, but any statically-linked x86_64 ELF binary should run fine on DSM. Once you have that, you can run the following command to find processes with open file handles:

./lsof | fgrep /volume1/

Next, the device mounted to /volume1/ for me was /dev/mapper/cachedev_0, which is somehow backed by /dev/mapper/vg1-volume_1, but the two are not interchangeable. The resize2fs has to be done on the cachedev, while the lvreduce (or lvm lvreduce now) has to be done on the vg-volume.

Some important notes here:

Running lvm lvreduce on vg1-volume_1 did not sync the reduced size to cachedev_0, so once that step was done, I rebooted my NAS to try and prevent any potential data corruption through accesses of cachedev_0. The reboot did sync the size (but obviously I had to do the unmount step again).
I strongly suggest getting a copy of screen somewhere and running resize2fs inside of that. Because it may take a really long time (>24h for me) and if your SSH session drops at any point, the process will be killed and your file system may be left in a damaged state.
I further suggest you run resize2fs with -p so it will tell you the current progress, because otherwise there is no way to know whether it's gonna take another minute or another week.
If you can't get screen or started the command already and are worried about it potentially getting killed, you can detach it from your SSH session by pressing ctrl-Z and running bg followed by disown -h. At that point, you have no way to get its output or exit code anymore, but at least it won't get killed if your SSH session drops. You can check if the process still exists with ps.

At some point after Paul's post was written but before 2020, Synology applied a patch to their fork of mdadm that explicitly refuses reducing the number of drives in a pool that isn't RAID-0 or RAID-1. I don't know why they did this, since the kernel driver supports that just fine and I was able to shrink a RAID-6 pool from 8 to 5 drives by removing that patch, but it's a whole other issue that I described in this post.