3

I have massive of files in my system and every file has one corresponded file name. For example,

test.pdf has a test-project.zip test2.pdf has a test2-project.zip

test.pdf and test2.pdf are the original files and test-project.zip and test2-project.zip are generated by my script.

I need to find out if all of my original files have the 'filename'-project.zip corresponded to the original file.

I can use

find /project/ -name "*.pdf" | wc -l
find /project/ -name "*-project.zip" | wc -l

to find out if the numbers match but I need to know which file has no corresponded file.

Can anyone help me about it? Thanks a lot!

FlyingCat
  • 207

2 Answers2

5

Quicky script, adapt as you see fit:

#!/usr/bin/env bash

find /project/ -name '*.pdf' -print0 | while read -d $'\0' i; do
  if [ ! -e "${i/%.pdf/-project.zip}" ]; then
    echo "${i/%.pdf/-project.zip} doesn't exist!"
  fi
done

exit 0

-d $'\0' sets the delimiter for read to nullbyte, while -print0 is the equivalent for find, so this should be bulletproof against files with spaces and newlines in their names (obviously irrelevant in this case, but useful to know in general). ${i/%.pdf/-project.zip} replaces the .pdf at the end of the variable $i with -project.zip. Other than that, this is all standard shell scripting stuff.

If you wanted to shorten it even more, you could also use

[ -e "${i/%.pdf/-project.zip}" ] || echo "${i/%.pdf/-project.zip} doesn't exist!"

...instead of the if statement. I think that if is easier to work with if you're using more than a single, short line (you can get around this by using a function, but at that point you aren't getting any psace saving vs. using the if).

Assuming you have bash 4+ (you probably do; you can check with bash --version), you can use the globstar option instead of find:

#!/usr/bin/env bash

shopt -s globstar
for f in /project/**/*.pdf; do
  if [ ! -e "${f/%.pdf/-project.zip}" ]; then
    echo "${f/%.pdf/-project.zip} doesn't exist!"
  fi
done

exit 0

This has the advantage of being pure bash, so it should be faster (only noticeably so with at least hundreds of files, though).

evilsoup
  • 14,056
0

Here are two ways you could do it. One is a godawful Bash one-liner which spawns at least one, possibly two, processes for each file it matches:

[me@box] $ for file in `find -name '*.pdf' -exec perl -le'$f=shift(); $f =~ s@\.pdf$@@; print $f' {} \;`; do (TESTFILE="$file-project.zip"; if [ ! -f $TESTFILE ]; then echo "missing $TESTFILE"; fi); done

Since that's enough to make anyone's eyes bleed, here's a Perl script which does the same job, much more sanely than any Bash script ever could:

#!/usr/bin/env perl
use strict;

my $path = shift() || die "$0 requires a path argument\n";
my @files = `find "$path" -name '*.pdf'`;

foreach my $file (@files) {
  chomp $file;
  my $zip = $file;
  $zip =~ s@\.pdf$@-project.zip@;
  next if -f $zip;
  print "missing $zip\n";
};

Copy that into, e.g., 'find-missing.pl', then invoke find-missing.pl /project/.

Aaron Miller
  • 10,087