Drive changed mapping during e2fsck process

Question

I'm unsure if "mapping" is the right term.

Short:

I ran the following command:

$ sudo e2fsck -b 32768 /dev/sde1
e2fsck 1.46.3 (27-Jul-2021)
/dev/sde1 was not cleanly unmounted, check forced.
Pass 1: Checking inodes, blocks, and sizes
Error reading block 77071131 (Input/output error) while getting next inode from scan. Ignore error<y>? yes
Force rewrite<y>? yes

I noticed that it had been sitting there for a while and then randomly noticed that the drive in question was no longer /dev/sde but had changed to /dev/sdh. I've let the process run overnight and is still going, but now I'm wondering if it will ever finish due to the change. Should I just cancel this process?

What led to this:

I noticed one of my drives (ext4, 8tb, shucked WesternDigital) was not mounted one morning after it was working fine the night before. The nightly run of python3 /opt/snapraid-runner/snapraid-runner.py had taken quite a long time and ended in errors. When attempting to mount, I received an error ... can't read superblock (udisks-error-quark, 0). I ran the following:

$ sudo fsck /dev/sde1
fsck from util-linux 2.36.1
e2fsck 1.46.3 (27-Jul-2021)
/dev/sde1: recovering journal
/dev/sde1 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Error reading block 87556096 (Input/output error) while reading inode and block bitmaps.  Ignore error<y>? yes
Force rewrite<y>? yes
Block bitmap differences:  +(87556096--87560223)
Fix<y>? yes
/dev/sde1: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sde1: 4438/244191232 files (0.8% non-contiguous), 1232265894/1953506385 blocks

It still wouldn't mount, so I found and followed an article about fixing this (https://www.linuxbabe.com/desktop-linux/fix-cant-read-superblock-error) which led to me running the command above: $ sudo e2fsck -b 32768 /dev/sde1

score 1 · Answer 1 · answered Feb 24 '22 at 17:15

Device mappings don't just change. If a new sdh suddenly appeared, that means the disk was detected anew (e.g. because it completely stopped responding and the kernel had to reset the SATA port) – but it also likely means that programs using the old device node will not be making any more progress.

When that happens (i.e. if the kernel thinks the disk was disconnected), the old mapping usually becomes "dead" and would've already been removed if it weren't for the fsck process still holding it open (which is why the newly detected disk became 'sdh'). Processes reference the mappings only at 'open' time, the resulting file handle is associated with a specific device, not with a specific name.

Check your dmesg for disk-related errors. It's possible the "write" command issued by fsck never actually finished. (A full fsck of an ext4 filesystem should take minutes, not days.) At this point, trying to perform any filesystem fixes in-place is likely to be a bad idea and could make things worse – the disk is already showing its inability to hold data.

At minimum you should use ddrescue to clone it as-is to another working disk and only then try to repair the ext4 structure. (It would be best to have two clones, one as a backup in case filesystem repairs go wrong and you have to restart...)

Drive changed mapping during e2fsck process

1 Answers1