Find chunks of all 0x55 or 0xAA on hard drive image and blank them?

Question

This is a weird request, but I had a hard drive that I initially ran badblocks on and then stopped partway through. So it started out with part of the drive covered in 0xAA and another part of the drive covered in 0x55. I then put NTFS filesystem on it, leaving the empty regions filled with this garbage, and then files written to it overwrote those regions.

Later the drive died, with many chunks of data missing throughout the entire drive.

It's now a raw image of an NTFS partition stored on a btrfs filesystem that I can probably delete, but I want to make sure there aren't any important files on it I can recover.

The drive image is taking up a lot more space than necessary because all of those 0xAA and 0x55 can't be stored as "holes". Likewise, NTFS recovery program DMDE lists a lot of "files" that contain nothing but 0xAA and 0x55.

Is there some way to go through and find any blocks/chunks/chains that are entirely 0xAA or 0x55 and blank them to 0x00 so they take up zero space on the btrfs volume? They aren't zero, but they don't contain any information either.

endolith · Answer 1 · 2021-05-28T13:10:43.610

I realized I could just write my own Python program to do this:

filename = 'NTFS_3TB.img'
chunk_size = 512
with open(filename, 'r+b') as f:
    while True:
        chunk = f.read(chunk_size)
        if chunk == b'':
            break
        if chunk == b'\x55'*chunk_size:
            start = f.tell()-chunk_size
            print(f'5: {start}')
            f.seek(start)
            f.write(b'\x00'*chunk_size)
        if chunk == b'\xaa'*chunk_size:
            start = f.tell()-chunk_size
            print(f'A: {start}')
            f.seek(start)
            f.write(b'\x00'*chunk_size)

I looked through the file with a hex editor and confirmed that the chunk size was correct, stepped through a few iterations and watched them being changed in the hex editor, etc. to make sure it wasn't wiping the wrong chunk.

More efficient version:

filename = 'NTFS_3TB.img'
chunk_size = 512
all_5s = b'\x55'*chunk_size
all_As = b'\xaa'*chunk_size
all_0s = b'\x00'*chunk_size
try:
    with open(filename, 'r+b') as f:
        f.seek(236039143424)  # From last run
        while True:
            chunk = f.read(chunk_size)
            if chunk == b'':
                break
            if chunk == all_5s:
                start = f.tell()-chunk_size
                f.seek(start)
                f.write(all_0s)
            if chunk == all_As:
                start = f.tell()-chunk_size
                f.seek(start)
                f.write(all_0s)
finally:
    print(f'Position: {start}')

score 0 · Answer 2 · answered May 27 '21 at 19:44

I don't think a tool exists to do this safely.

If you had a healthy mounted filesystem, fstrim would free all unused blocks.

If you use something like tr to arbitrarily translate 0xAA and 0x55 values, it will get single bytes and likely corrupt valid data. Additionally, tr was originally designed for ascii files and may work badly on binary files.

Even if you only translated whole blocks containing only 0xAA and 0x55 values, you might accidentally clear valid data or metadata blocks.

Probably what you want is something that checked free blocks in the filesystem to see if they are a single value and then used fstrim on each block.

My approach to this would be to:

mount the filesystem read only (if possible) and copy off everything it could
use a file scavanger to get everything else
use checksums and binary compares to remove duplicates in 2 that are also in 1
scan the results of 2 and remove obvious junk

Note that step 1 might get a lot of corrupted files containing zeroed out bad blocks.

Find chunks of all 0x55 or 0xAA on hard drive image and blank them?

2 Answers2