1

I would like to list the directories which is having size more than 10MB. I tried the same using find command but find command is not giving result with size and type option. It is working with any of type or size option.

I tried these commands:

  • find /opt -type d - working
  • du -sh /opt/test - it is 20MB
  • find /opt -size +10M - listed /opt/test/file1
  • find /opt -type f -size +10M - listed /opt/test/file1
  • find /opt -type d -size +10M - This command is not showing any output.

find command is not working with -type d and -size options together. Can anyone suggest me a solution to achieve my requirement?

1 Answers1

4

A directory is also a file, it has its own size which is not the total size of files within. It makes even more sense when you are able to read a directory like a regular file, but nowadays it's not that common.

If you run stat on a directory then you will see the size I'm talking about. It's not that find command does not work with -type d and -size option together. It does work, only the size tested with -size is not what you want.

On the other hand du is deliberately designed to sum space allocated to all files in the directory:

The size of the file space allocated to a file of type directory shall be defined as the sum total of space allocated to all files in the file hierarchy rooted in the directory plus the space allocated to the directory itself.

(source)

To find all directories for which du returns a number great enough, you can build a custom test in find that will actually run du and parse its output:

find . -type d -exec sh -c '
      result="$(du -sk "$1" | tr "\t" " ")"
      result="${result%% *}"
      [ "$result" -gt 10240 ]
   ' find-sh {} \; -print

Notes:

  • POSIX requires du to separate the number and the pathname with one space character. Still GNU du in my Debian 10 uses the tab character (even if POSIXLY_CORRECT is set). This is the reason I used tr to convert tab characters to spaces. Tab characters (if any) in the pathname printed by du are also converted, but it doesn't matter because this part of the line is discarded anyway.
  • POSIX du without -k uses units of 512 bytes. Implementations may use different defaults and various options/ways to specify the unit. Explicitly imposing -k seems right; this option is required by POSIX, it should be widely supported and unambiguous.
  • find-sh is explained here: What is the second sh in sh -c 'some shell code' sh?
  • The code may be sub-optimal. My main concern was to make it portable.
  • Here -exec is a test. You can add more tests and/or actions before (or instead of) -print.

If all you need is to know big directories (rather than to parse the information further in some automated way, or to perform additional tests), then this may be enough:

du -k . | sort -n

Next you need to manually filter the result (compare to the threshold) and pick zero or more last entries. Pathnames containing newline characters will misbehave, still occasional newlines probably won't compromise the entire result, especially if you inspect it with insight, not mechanically.

Your du may support -t/--threshold option which can do the filtering for you; but the option is not portable.


If interactively, use ncdu.