0

I am trying to use bash 'find' to process all folders that contain a .log file and get their sizes. However the wildcard is not working as expected. This returns nothing:

find . -type d -exec test -e '{}/*.log' \; -exec du -d0 '{}' \;

however if I replace *.log with foo.log, then it works as expected for directories containing that file name.

Based on some similar SE posts, I tried:

find . -type d -exec bash -c 'test -e "{}/*.log"' \; -exec du -d0 '{}' \;
find . -type d -exec bash -c 'test -e "$1/*.log"' '{}' \; -exec du -d0 '{}' \;

but those don't work any better.

2 Answers2

2

With find … -exec test -e '{}/*.log' you're passing a string like something/*.log to test, where * is literal. Neither tool treats it as a wildcard. Some implementations of find won't even expand {} if it's a part of an argument (as opposed to {} being a whole argument).

One of your later tries embeds {} in the shell code. Never embed {} in the shell code. The other try is better in this matter, you're close to a solution. This will kinda work:

# still flawed though
find . -type d -exec bash -c 'test -e "$1/"*.log' bash '{}' \; -exec du -d0 '{}' \;

See What is the second sh in sh -c 'some shell code' sh?. The main "fix" however is in not quoting the asterisk in the shell code. This way it's a wildcard in the inner shell (but not in the outer shell, it's properly single-quoted there). The problem is *.log may expand to more than one word (if there are many matching files) and this case will break the test invocation.

The following code will find directories with *.log files:

find . -type d -exec sh -c '
   for f in "$1/"*.log; do test -e "$f" && exit 0; done; exit 1
' sh {} \; -print

The code is portable. There's no need for inner bash, sh should be faster. Replace -print with -exec du … if you wish.

This works by returning success (exit 0) from the inner shell as soon as test confirms existence of some matching file¹. Not-yet-tested matching files (if any) will not be tested in vain, this saves time. If there is no match then the pattern will stay literal, test will fail and the whole shell will exit with failure (exit 1). Remember -exec is also a test, so it affects if -print (or -exec du … or whatever you put there) is performed.


Another approach may be to let find itself find matching files with

find . -name '*.log' … -print

and to parse its output to isolate directory names, finally to use xargs with du. Directories could appear multiple times, newlines in pathnames would require non-portable code (starting from -print0). I think this would be unnecessarily complicated. Finding directories seems superior.


¹ Note test -e tells you if there's a file which may be a directory or whatever. To confirm existence of a regular file use test -f.

1

It would be easier to find/scan for log files and then collect the unique directory names.

This find command should pull out the directories, adding uniq to remove duplicates. The -z/-0 flags help ensure pathnames with newlines/spaces/ quotes are parsed flawlessly:

find . -type f -name \*.log -exec dirname -z {} \+ | uniq -z | xargs -0 -r du -d0

Add | sort -rn |head if searching for the biggest disc usage.

DuncG
  • 582