0

I have a HP Microserver at home for personal use (hosting music, pictures, videos that can be accessed from any device in the house) running Ubuntu server with a software RAID. Issues started last week when I believe it lost power during the shutdown process and now it is failing to boot.

A friend set this up for me years ago who is no longer around so I am attempting to muddle through and sort it on my own atm. There are four 2TB hard disks running as RAID 5 (or 6 I can't remember) so I have 6TB of usable storage across the disks.

md/raid:md0: device sda3 operational as raid disk 0
md/raid:md0: device sdd1 operational as raid disk 3
md/raid:md0: device sdc1 operational as raid disk 2
md/raid:md0: allocated OkB
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed
mdadm: failed to start array /dev/nd/0: Input/output error
mdadm: CREATE user root not found
mdadm: CREATE group disk not found
mdadm: /dev/md/0 is already in use.
Could not start RAID arrays in degraded mode. Gave up waiting for root device. Common problems:
- Boot args (cat /proc/cmdline)
    - Check rootdelay= (did the system wait long enough?)
    - Check root= (did the system wait for the right device?)
Missing modules (cat /proc/modules: ls /dev)
ALERT! /dev/disk/by-uuid/1eb18515-c1e0-4f77-92ec-0e22d94e4803 does not exist. Dropping to a shell!
BusyBox v1.21.1 (Ubuntu 1:1.21.0-1ubuntu1.4) built-in shell (ash)
Enter 'help' for a list of built-in commands.
(initramfs)_

I found some other posts such as these:

https://www.reddit.com/r/techsupport/comments/mt52ru/help_fixing_cannot_start_dirty_degraded_array/?rdt=32975

https://forums.debian.net//viewtopic.php?f=10&t=142536

https://bbs.archlinux.org/viewtopic.php?id=193302

https://ubuntuforums.org/showthread.php?t=854528

Restart degraded RAID array after crash

And attempted some of the suggestions. This is the details of the disks:

State : active, degraded, Not Started
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Number   Major   Minor   RaidDevice State
   0       8       3        0      active sync   /dev/sda3
   1       0       0        1      removed
   2       8      33        2      active sync   /dev/sdc1
   3       8      49        3      active sync   /dev/sdd1

I'm assuming that the 'removed' disk is the one that's reserved in case a disk breaks and will take over in that situation. Or is that disk potentially the cause of the problem? I have removed each disk in turn and ensured they are connected properly

When trying this command:

echo "clean" > /sys/block/md0/md/array_state

It returns this error:

/bin/sh: can't create /sys/block/md0/md/array_state: Permission denied

Trying to force reassemble the array results in:

mdadm --assemble --force /dev/md0 /dev/sda3 /dev/sdc1 /dev/sdd1
mdadm: CREATE user root not found
mdadm: CREATE group disk not found
mdadm: /dev/sda3 is busy - skipping
mdadm: /dev/sdc1 is busy - skipping
mdadm: /dev/sdd1 is busy - skipping

Though I'm wondering should my assemble command also include /dev/sdb1?

One of the other forum posts mentions that they are busy because they are already part of an array ( https://bbs.archlinux.org/viewtopic.php?pid=1500593#p1500593 )

Is it safe for me to run mdadm --stop /dev/md0 and then try to rebuild with the assemble command but I presume also add the disk that is currently showing as removed?

Don't want to go further without any extra guidance as I don't really understand what will happen. I've never dealt with RAIDs on my own and I don't want to mess it up anymore and hopefully recover the data that's on it.

Any help and advice is greatly appreciated.

1 Answers1

2

Your RAID array does not have a spare disk reserved in case something goes wrong. Instead, all the disks are doing work and storing data. Because of fancy RAID technology, if any one of them crashes, all your data is recoverable from the other three.

In your case, the "removed" disk has failed so badly that it can't be detected by your OS. The RAID array is working as designed: it's refusing to start up, since if a second disk failed right now, data would be lost.

First, stop trying to start up your RAID array or read/write data from the other three disks, since this could lead to data loss. Second, purchase a replacement disk of the same model as your dead one. Finally, swap it out for the dead disk, and use mdadm to add the new disk to the array -- I do not know specifics about this process, but it's something mdadm is designed to do, and perhaps others can chime in.

theta0
  • 46