I’m currently using a software RAID-1 array on linux, built on top of a HDD and a SSD. I have a strong feeling that the SSD is failing.
I’d like to check how bad the SSD is behaving. I ran a check of the array, with echo check > /sys/block/md1/md/sync_action and, when it was finished, I had a look at the content of /sys/block/md1/md/mismatch_cnt. I ran it 3 times in a row, and got 3 different results: 256, 128 and 384. What puzzles me is that the second run gave a lower result than the first one. Was a mismatch fixed?
Is there a way I can get more detail about the mismatches that are detected? It might be interesting to check if the mismatching blocks change or if it’s always the same. I’d also like to have a look at the contents of the mismatching blocks, to see if I can tell which one is correct. (For example if the SSD has zeroed some blocks it could not reread.)
Moreover, I see there is an option to repair an MD array. But I’m somewhat suspicious: how can the kernel guess which one of the mismatching blocks is correct?