1

I’m currently using a software RAID-1 array on linux, built on top of a HDD and a SSD. I have a strong feeling that the SSD is failing.

I’d like to check how bad the SSD is behaving. I ran a check of the array, with echo check > /sys/block/md1/md/sync_action and, when it was finished, I had a look at the content of /sys/block/md1/md/mismatch_cnt. I ran it 3 times in a row, and got 3 different results: 256, 128 and 384. What puzzles me is that the second run gave a lower result than the first one. Was a mismatch fixed?

Is there a way I can get more detail about the mismatches that are detected? It might be interesting to check if the mismatching blocks change or if it’s always the same. I’d also like to have a look at the contents of the mismatching blocks, to see if I can tell which one is correct. (For example if the SSD has zeroed some blocks it could not reread.)

Moreover, I see there is an option to repair an MD array. But I’m somewhat suspicious: how can the kernel guess which one of the mismatching blocks is correct?

1 Answers1

1

Well… Reading the source code of the process_checks function in the drivers/md/raid1.c file from linux 4.9.88, if I read it correctly:

  1. There is no way to make the check or repair operations verbose about where mismatches are found.
  2. I a read failure in encountered during a check or repair operation, the failing block will be rewritten.
  3. If a mismatch in encountered during a repair operation, it will be fixed by copying the block from “primary” (first non-failing) block to the other one(s).

Hence, there is no guess which of the mismatching blocks is correct; it just takes the first one as correct. (As I read it, even if there are 3 components and the 2nd and 3rd have the same contents.)