6

I'd like to know how to create a list with md5sums from files in the current directory - files that are over a specified size. I can do one or the other, but I don't know how to combine the two.

Glorfindel
  • 4,158
Bug J.
  • 63

1 Answers1

8

You can pipe find (which will list all files you are interested in) output to md5sum.

xargs is needed to avoid creating a loop.

To write it up in a command:

find . -maxdepth 1 -size +30M -type f -print0 | xargs -0 md5sum

  • . says "start listing items from current directory"
  • -maxdepth 1 specifies to list only files in this directory (do not descend deeper)
  • -size +30M specifies to list only files which exceed 30 megabytes of space (you can also use k and G suffixes if needed, read more about its possibilities in man find if needed)
  • -type f avoids listing directories - you can't count md5sum for a directory
  • -print0 makes find separate filenames with the null byte. We use this because everything fails when you put a newline in filename.

xargs will take stuff from standard input (thanks to -0 flag it will treat the null byte as record separator) and feed them as arguments to md5sum.

You can also make it without the pipe, but I find that syntax confusing and I prefer to pipe it to xargs: find . -maxdepth 1 -size +30M -type f -execdir md5sum {} \;

And, as @David writes in the comments, you can put + after the {}, i.e. find . -maxdepth 1 -size +30M -type f -execdir md5sum {} + \;.

What does it change? md5sum can be called for two files in two ways: md5sum file1; md5sum file2 or md5sum file1 file2. Without the + you get the first option, adding + results in executing it the second way. The most important benefit is speed, since md5sum executes only once. It may not be that beneficial for some programs, but for some cases, for example a program can then run on many cores and speed-up the work by the factor of NUM_CPUS.

About that strange syntax (from man find):

-execdir command ;

Execute command; true if 0 status is returned. All following arguments to find are taken to be arguments to the command until an argument consisting of ';' is encountered. The string '{}' is replaced by the current file name being processed everywhere it occurs in the arguments to the command, not just in arguments where it is alone, as in some versions of find. Both of these constructions might need to be escaped (with a '\') or quoted to protect them from expansion by the shell. See the EXAMPLES section for examples of the use of the -execdir option. The specified command is run once for each matched file.