2

I am trying to display the filze size in human readable format in following find command

 find $BASE_DIR/ -user $USER -size +$LOWERSIZELIMIT -mtime +$MY_MTIME -type f -printf      "%s %p\n" 2> /dev/null | 
 sort -nr | 
 head -n $NUMFILES >> $TESTFILE

My function to find large files looks like this. It takes in $1 as an argument in order to pass the base directory path to the find command.

 function find_files {
 #echo "In find files"
 # $1 = base directory from where to start the search

 find $1/ -user $USER -size +$LOWERSIZELIMIT -mtime +$MY_MTIME -type f -printf "%s %p\n" 2> /dev/null | sort -nr | head -n $NUMFILES >> $TESTFILE

 if [[ -s $TESTFILE ]] ; 
 then 
   echo "***********************************************************************" >> $DUMPFILE
   echo "***********************************************************************" >> $DUMPFILE
   echo "***********************************************************************" >> $DUMPFILE
   echo "***********************************************************************" >> $DUMPFILE
   #cat $TESTFILE
   cat $TESTFILE >> $DUMPFILE
   rm $TESTFILE
   return 0
 else
   return 1
 fi
 }
mdpc
  • 4,489

2 Answers2

2

Use the -exec option for find (Don't forget to end with \;) If you require only part of the information from your exec command, parse with awk.

find $BASE_DIR/ -user $USER -size +$LOWERSIZELIMIT -mtime +$MY_MTIME -type f -exec /bin/ls -hl {} \;

Use another command if you don't want human readable output of ls, or awk it like so

find $BASE_DIR/ -user $USER -size +$LOWERSIZELIMIT -mtime +$MY_MTIME -type f -exec /bin/ls -hl {} \; | awk '{ print $5,$9 }'

The trickiest parts of find's exec are using {} as the result item for what is found, and the command must end in \; If you have a chain of commands on the same line separated with ; You will need to end your find and then add a ; so there are 2, like some command; find $BASE_DIR/ -user $USER -size +$LOWERSIZELIMIT -mtime +$MY_MTIME -type f -exec /bin/ls -hl {} \;; some command;

2

The subtlety here is the sort, you cannot (easily & correctly) sort human-readable numbers (unless you have sort from GNU coreutils >= 7.5, it supports a -h option, e.g. du -h | sort -h).

Save the following to hr.awk:

BEGIN { split("KMGTPEZY",suff,//)}
{
  match($0,/([0-9]+)[ \t](.*)/,bits)
  sz=bits[1]+0; fn=bits[2]
  i=0; while ((sz>1024)&&(i<length(suff))) { sz/=1024;i++ }
  if (i) printf("%.3f %siB %s\n",sz,suff[i],fn)
  else   printf("%3i B %s\n",sz,fn)
}

Then you can do:

find $BASE_DIR/ -user $USER -size +$LOWERSIZELIMIT -mtime +$MY_MTIME \
  -type f -printf "%s %p\n" 2> /dev/null | sort -nr |
  gawk -f hr.awk | head -n $NUMFILES >> $TESTFILE

The gawk script accepts find ... -printf "%s %p\n" as input, and converts the first field to human-readable size with suffix (in IEC 210 units).

See also this popular question: https://serverfault.com/questions/62411/how-can-i-sort-du-h-output-by-size

mr.spuratic
  • 2,758