2

I have directories (and subdirectories) with several thousand PDFs and was trying to get the total pages in all those PDFs together. So I try running this command:

find . -name \*.pdf -exec pdfinfo {} \; | grep Pages > filelist

And I get the number of pages for each file piped into the filelist file.

I would really like to also have the filename piped in but can't figure out how to do it (pdfinfo returns a lot of data about the PDF but not the filename itself).

AndCar
  • 21

1 Answers1

0

Precede -exec with -print. This way the pathname will be printed by find before the respective pdfinfo prints its output. -print is the default action (e.g. when you run sole find .), but the presence of -exec suppresses the default.

If you prefer the pathname after the output of the respective pdfinfo then you may try -exec … -print, but note in this case -print will be performed iff -exec (i.e. pdfinfo) succeeds. In general one uses -exec … -print when -exec is used as a test. In the context of your question I personally prefer the pathname first, so -print -exec ….

Then you need to adjust your grep. Use the fact every pathname considered by find . must start with .. grep -E '^(\.|Pages)' matches lines with a literal dot at the beginning or the string Pages at the beginning.

The final command will be:

find . -name \*.pdf -print -exec pdfinfo {} \; | grep -E '^(\.|Pages)'

(redirect the output on your own).

Consider -type f as the first test in case some non-regular file matches -name \*.pdf by chance. This will avoid calling pdfinfo on directories and such.