I need your valuable help on this:
I have a NAS which is capable of having a RAID with four disks. I used it for a long time with only two, sda and sdd on RAID 1 - they are WD30EFRX. Now, I bought two more WD30EFRX (refurbished) and my idea was to add them to have a RAID 5 array. These were the steps I've made:
Didn't do a backup (because I'm stupid...)
Unmount everything:
$ sudo umount /srv/dev-disk-by-uuid-d1430a9e-6461-481b-9765-86e18e517cfc
$ sudo umount -f /dev/md0
Stopped the array:
$ sudo mdadm --stop /dev/md0
Change the array to a RAID 5 with only the existing disks:
$ sudo mdadm --create /dev/md0 -a yes -l 5 -n 2 /dev/sda /dev/sdd
I did a mistake here and used the whole disks instead of the /dev/sd[ad]1 partitions and MDADM warned me that /dev/sdd had a partition and it will be overrided... I pressed 'Y' to continue... :-( It took a long time to complete without any errors.
Then I added the two new disks /dev/sdb and /dev/sdc to the array:
$ sudo mdadm --add /dev/md0 /dev/sdb
$ sudo mdadm --add /dev/md0 /dev/sdc
And did a grow to use the four disks:
$ sudo mdadm --grow /dev/md0 --raid-disk=4
During this process a reshape was performed like this
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 sdc[4] sdb[3] sdd[2] sda[0]
2930134016 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
[==================>..] reshape = 90.1% (2640502272/2930134016) finish=64.3min speed=75044K/sec
bitmap: 0/22 pages [0KB], 65536KB chunk
$ sudo mdadm -D /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Fri Mar 11 16:10:02 2022
Raid Level : raid5
Array Size : 2930134016 (2794.39 GiB 3000.46 GB)
Used Dev Size : 2930134016 (2794.39 GiB 3000.46 GB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sat Mar 12 20:20:14 2022
State : clean, reshaping
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Consistency Policy : bitmap
Reshape Status : 97% complete
Delta Devices : 2, (2->4)
Name : helios4:0 (local to host helios4)
UUID : 8e1ac1a8:8eabc3de:c01c8976:0be5bf6c
Events : 12037
Number Major Minor RaidDevice State
0 8 0 0 active sync /dev/sda
2 8 48 1 active sync /dev/sdd
4 8 32 2 active sync /dev/sdc
3 8 16 3 active sync /dev/sdb
When this looooooong process has completed without errors, I did a e2fsck
$ sudo e2fsck /dev/md0
And... it gave this info:
e2fsck 1.46.2 (28-Feb-2021)
ext2fs_open2: Bad magic number in super-block
e2fsck: Superblock invalid, trying backup blocks...
e2fsck: Bad magic number in super-block while trying to open /dev/md0
The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem. If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
or
e2fsck -b 32768 <device>
At this point I realized that I've made some mistakes during this process... Googled for the problem and I think the disks in the array are somehow order 'reversed' judging from this post: https://forum.qnap.com/viewtopic.php?t=125534
So, the partition is 'gone' and when I try to assemble the array now, I have this info:
$ sudo mdadm --assemble --scan -v
mdadm: /dev/sdd is identified as a member of /dev/md/0, slot 1.
mdadm: /dev/sdb is identified as a member of /dev/md/0, slot 3.
mdadm: /dev/sdc is identified as a member of /dev/md/0, slot 2.
mdadm: /dev/sda is identified as a member of /dev/md/0, slot 0.
mdadm: added /dev/sdd to /dev/md/0 as 1
mdadm: added /dev/sdc to /dev/md/0 as 2
mdadm: added /dev/sdb to /dev/md/0 as 3
mdadm: added /dev/sda to /dev/md/0 as 0
mdadm: /dev/md/0 has been started with 4 drives.
$ dmesg
[143605.261894] md/raid:md0: device sda operational as raid disk 0
[143605.261909] md/raid:md0: device sdb operational as raid disk 3
[143605.261919] md/raid:md0: device sdc operational as raid disk 2
[143605.261927] md/raid:md0: device sdd operational as raid disk 1
[143605.267400] md/raid:md0: raid level 5 active with 4 out of 4 devices, algorithm 2
[143605.792653] md0: detected capacity change from 0 to 17580804096
$ cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active (auto-read-only) raid5 sda[0] sdb[3] sdc[4] sdd[2]
8790402048 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
bitmap: 0/22 pages [0KB], 65536KB chunk
$ sudo mdadm -D /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Fri Mar 11 16:10:02 2022
Raid Level : raid5
Array Size : 8790402048 (8383.18 GiB 9001.37 GB)
Used Dev Size : 2930134016 (2794.39 GiB 3000.46 GB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sat Mar 12 21:24:59 2022
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Consistency Policy : bitmap
Name : helios4:0 (local to host helios4)
UUID : 8e1ac1a8:8eabc3de:c01c8976:0be5bf6c
Events : 12124
Number Major Minor RaidDevice State
0 8 0 0 active sync /dev/sda
2 8 48 1 active sync /dev/sdd
4 8 32 2 active sync /dev/sdc
3 8 16 3 active sync /dev/sdb
The array mounts but there is no superblock.
At this stage, I did a photorec to try to recover my valuable data (mainly family photos):
$ sudo photorec /log /d ~/k/RAID_REC/ /dev/md0
I just recovered a lot of them but others are corrupted because on the photorec recovering process (sector by sector) it increments the sector count as time passes but then the counter is 'reset' to a lower value (my suspition that the disks are scrambled in the array) and it recovers some files again (some are equal).
So, my question is: Is there a chance to redo the array correctly without loosing the information inside? Is it possible to recover the 'lost' partition existed on RAID 1 to be able to do a convenient backup? Or the only chance is to have a correct disk alignment inside the array to be able to use photorec to recover the files correctly?
I appreciate your help. Thanks a ton!
Best,
Jorge