31

I have a question concerning unrecoverable errors on a BTRFS file system. Specifically, I've run a BTRFS Scrub recently after experiencing a problem with one of my RAM sticks and it seems to have discovered 4 uncorrectable errors. This is the output:

scrub status for <UUID>
    scrub started at Thu Dec 25 15:19:22 2014 and was aborted after 89882 seconds
    total bytes scrubbed: 1.87TiB with 4 errors
    error details: csum=4
    corrected errors: 0, uncorrectable errors: 4, unverified errors: 0

Luckily I have everything backed up in a tertiary backup so I am not particularly concerned about losing the files (I'm well aware of the issues associated with the experimental status of BTRFS, I have multiple backups to keep my data safe, and determined to continue using it so please no: "Solution; don't use BTRFS" posts).

I would like to know, however, how to determine which files are associated with the uncorrectable errors? I want to find them, delete them, and replace them with their backed up copies.

If anyone has information on how to do this, I would love to hear from you.

Thank you in advance.

RedHack
  • 500

7 Answers7

22

I have found the following method useful...

btrfs scrub the volume.

You will be presented with any number of csum errors as you've shown above.
Using your example error details: csum=4 . Use that number in the tail directive of the following statement:

dmesg | grep "checksum error at" | tail -4 | cut -d\  -f24- | sed 's/.$//'

It is handy to pipe this out to a file (e.g. > csums.txt)

I've tried a number of the suggested inode search approaches and they've all met with limited if any success.

Mark
  • 350
5

Yes, mapping from INODE or Block Number back to a filename can be difficult. If you are really interested, you can try something like this and see which file files to copy...afterall if the file is bad it should throw an error during the copy. I have previously used this type of technique.

 find /mount-point -type f -exec cp {} /dev/null \;

 where mount-point is the ROOT node/mount-point of the affected filesystem
mdpc
  • 4,489
4

dmesg will give you details about the files involved in the uncorrectable checksum errors. The messages typically look like this: "BTRFS: checksum error at logical [...] on dev [...], sector [...], root [...], inode [...], offset [...], length [...], links [...] (path: [...])"; the last piece of information is the absolute path to the file that's corrupted.

arrrr
  • 41
3

I came here looking for the "Uncorrectable error" from BTRFS too. The above grep didn't work for me; I had to use instead:

$ dmesg | sed -n -r 's#.*BTRFS.*i/o error.*path: (.*)\)#\1#p' | sort -u
somepath/somefile.txt

Note how the path is relative to the start of the subvolume - no indication of which subvolume it's in. This luckily wasn't a problem for me.

2

I also found this thread trying to figure out what to do next after finding BTRFS checksum errors.

The listed answers with dmesg didn't work for me; my scrub took a long time, and by then the oldest messages visible with sudo dmesg were too recent.

One solution could have been to leverage the -W, --follow-new flag to dmesg, something like starting the command below before the scrub:

$ sudo dmesg --follow-new | grep --line-buffered "checksum error at" >> checksum_errors.txt

And then do some post-processing on its output.

However, I found that searching using journalctl's -k, --dmesg and --grep flags were sufficient, and went back far enough to find all the errors I was experiencing. I printed out only the offset and the filename (and just ignored the trailing )) with the command below; in my case I had multiple errors in this one file.

$ sudo journalctl --dmesg --grep 'checksum error' | awk '{ print $25, $31 }' | sort -u

From there, I ran sha256sum on the bad file, confirmed it resulted in an error, and confirmed I had a backup of this file.

n8henrie
  • 325
  • 3
  • 11
2

Run this after the scrub:

sudo dmesg | grep -e "BTRFS warning.*path:" | sed -e 's/^.*path\: //' | sort -u

It will give you a nice list of all files affected.

damian101
  • 103
1

Can be simplified a little with:

sudo dmesg | awk -F": " '/BTRFS warning.*path:/ {print $NF}' | sort -u

.

Also, as an aside, I clarified the output of btrfs scrub status with the following:

while read -r f1 f2; 
  do { 
    echo -en "\n$f2\t";
    df -h $f2 | awk 'NR==2 {printf "%s\t%s\t%s\n",$2,$3,$4}';
    sudo btrfs scrub status $f1;
  }; 
done < <(mount | awk '/btrfs/ {print $1 " " $3}')

especially when you have several btrfs filesystems.