1

I would like to process every directory in my parent directory but only on a condition that these directories have no subdirectories in them.

Right now I have a following directory structure:

Music
  Band_A
    Record_A1
    Record_A2
  Band_B
    Record_B1
      CD_1
      CD_2

And the following script

while read -r dir
do
    echo "$dir"
     /Users/rusl/.cliapps/MQA_identifier "$dir" | tee "$dir"/mqa.log
done < <(find . -type d)

All it does is checks if every music file in a directory was encoded using MQA and logs its output like this:

1   MQA 44.1K   10 - Bury A Friend [MQA 24_44].flac

It works and creates the logs I want in directories like Record_A1 and CD_1.

But it also creates a lot of redundant files. For example, it creates a log in Band_A directory, containing output for all files in all subdirectories in Band_A directory or it creates a log in Band_B and then Record_B1, again containing output for all files in respective subdirectories.

So how can I run a script and generate the logs ONLY for those directories that have no nested directories?

EDIT: also I think this script processes every subdirectory as many times as it is nested inside the parent top most directory. Not that it is critical, but still not efficient.

ruslaniv
  • 932

3 Answers3

2

You can use the following to find leaf directories:

find . -type d -links 2

The -links 2 options looks for files (directories in your case) that have exactly 2 hard links.

A directory has:

  • a link to itself
  • a link from the parent directory
  • a link from each subdirectory

So a directory without subdirectories will have 2 hard links, which is what you want.

Gohu
  • 1,029
2

The basic idea would be to have go through all directories, test if there are subdirectories and if so, run a part of the script.

while read -r dir ; do
    subdircount=$(find "$dir" -maxdepth 1 -type d | wc -l)
    if [[ "$subdircount" -eq 1 ]] ; then
        echo "$dir"
        /Users/rusl/.cliapps/MQA_identifier "$dir" | tee "$dir"/mqa.log
    fi
done < <(find . -type d)
Ljm Dullaart
  • 2,788
2

This command

find . -name . -o -type d -print -prune

will generate non-empty output iff there is at least one directory in the current working directory (the core idea came from this answer). You can include it in your shell code as a test. Something like

if ( cd -- "$dir" && [ -z "$(find . -name . -o -type d -print -prune)" ] ); then
   …
fi

where the subshell ( ) prevents cd from changing the current working directory of the main script. You could use find "$dir" … without cd but what if $dir expands to something starting with -? Double dash works with cd, not with find. Well, your main find (the one in <()) starts in ., so one may think all pathnames generated by it will start with .. It seems not all implementations of find behave like this though, so it's still good to code defensively.

Alternatively you can build the test into the main find command (inside <()):

find . -type d -exec sh -c '
   cd -- "$1" && [ -z "$(find . -name . -o -type d -print -prune)" ]
' find-sh {} \; -print

find-sh is explained here: What is the second sh in sh -c 'some shell code' sh?

Note this approach will run one additional sh per directory (with or without subdirectories), this is sub-optimal. I'm introducing it as a small step towards a bigger improvement, because piping to read is not the best way (if you must then it's good you use -r; but consider IFS= as well).

When piping find to read like you did, names containing newlines will make the code fail. A good practice is to run everything from within find, if possible. In your case it seems possible. The following code is standalone (not inside <()).

find . -type d -exec sh -c '
   cd -- "$1" && [ -z "$(find . -name . -o -type d -print -prune)" ] && do_something_more_with "$1"
' find-sh {} \;

The above still runs one sh per directory. Now it can be improved by:

find . -type d -exec sh -c '
   for dir do
      ( cd -- "$dir" && [ -z "$(find . -name . -o -type d -print -prune)" ] ) && do_something_more_with "$dir"
   done
' find-sh {} +

Here I used … && do_something_more_with "$dir" but you can choose if … then … instead.

In your case do_something_more_with "$dir" will be

{ echo "$dir"; /Users/rusl/.cliapps/MQA_identifier "$dir" | tee "$dir"/mqa.log; }

where { } group commands so the preceding && makes the entire group run conditionally.

Maybe instead of do_something_more_with "$dir" it's better to do_something_more_with . in the $dir directory. The relevant line will be:

( cd -- "$dir" && [ -z "$(find . -name . -o -type d -print -prune)" ] && do_something_more_with . )

But it's your decision. You can still echo "$dir" (or better printf "%s\n" "$dir"; note the entire shell code run by find -exec is single-quoted, so don't type printf '%s\n').

In case there are extremely many subdirectories in some directory, we really don't need the inner find to list them all. To tell whether the output is empty or not it's enough to break after the first line:

[ -z "$(find . -name . -o -type d -print -prune | head -n 1)" ]

(With head -c 1 we could break after the first character, but I think head -c is not portable.)

So the optimized code may look like this:

find . -type d -exec sh -c '
   for dir do
      ( cd -- "$dir" && [ -z "$(find . -name . -o -type d -print -prune | head -n 1)" ] ) \
      && { printf "%s\n" "$dir"; /Users/rusl/.cliapps/MQA_identifier "$dir" | tee "$dir"/mqa.log; }
   done
' find-sh {} +

also I think this script processes every subdirectory as many times as it is nested inside the parent top most directory. Not that it is critical, but still not efficient.

Unable to reproduce. find . -type d prints every directory just once. Maybe MQA_identifier works recursively. If it does then it (but not strictly the script) will process subdirectories multiple times (with tee writing to a file on a different level in the directory tree each time).