2

What is the best way to MD5 3200 files in 167 directories using md5sum where all hashes are compared to a .md5 file that is already created in each directory holding the files to be check summed. I would also need to create a log of which files fail the check sum verification so that I may attempt to fix the issue.

Edit:: If possible, doing this in a way that will take advantage of 8 cpu cores (2x quad core)would be an asset as I will be chewing through 1.1tb of files.

For the most part the directory structure is:

Root ->
        Subdir1 ->
                 File1
                 File2
                 File3
                 hashes.md5
        Subdir2 ->
                 File1
                 File2
                 File3
                 hashes.md5

Although some sub directories may have further sub directories to transverse like so:

Root ->
        Subdir1 ->
                 File1
                 File2
                 File3
                 hashes.md5
        Subdir2 ->
                 Sub-Subdir1 ->
                               File1
                               File2
                               File3
                               hashes.md5
                 File1
                 File2
                 File3
                 hashes.md5

What is the best way to transverse all directories starting from a root directory search for *.md5 files and then verifying the contents of the folder as compared to the hashes stored on file.

Also if it matters the hashes are stored in this format inside the *.md5 files:

5a243a798037cbc7b458326a1e8ff263 *File1
1c3a6609e413bb32512e263f821b2dc4 *File2
49615cf8bf8f23680305e964f6d53f85 *File3
6eb73fa3065fbc220ac9569a98b84c79 *File4
d4f103bf06902e4dbeb67b6975ae08b8 *File5
26b5053e374d1d7262c528eca6426a3a *File6
f6ff252801fbeac6274e00b36a2b9725 *File7
22812abfa9a47131ee8e548747c0903b *File8
b19cd459aaaf07a0c69cda7931827338 *File9

File names may also have spaces such as "*File - some other details.ext"

2 Answers2

3

You could use find, for example:

find . -name hashes.md5 -execdir md5sum --quiet --check hashes.md5 \; > logfile

This will search for files named "hashes.md5", then go to the directory of the found file and execute md5sum. The --quiet tells md5sum not to print anything if a file is OK, only if it is not

Edit: I don't know if find will take advantage of 8 cpus, probably not. You could launch several instances of find though, each working on a different set of subfolders.

tastytea
  • 262
0

You can use the program hashdeep (https://github.com/jessek/hashdeep) which does supports all the features you desire and more:

  • additional hashes (sha1, sha256, tiger, whirlpool)

  • more than 1 hash per file

  • three matching modes (audit mode (all hashes must match, no new nor missing files), positive match, negative match)

  • multi-threading