Based on dzsuz87's answer, which in turn was based on garyjohn's answer (credit to both!), I thought it would be good to add an actual histogram to the output, in order to more easily visualize the results if there is a large range of file sizes. Still effectively a one-liner, my new code looks like:
find . -type f -print0 | xargs -0 ls -l | awk '
{
n=int(log($5)/log(2));
if (n<10) { n=10; }
size[n]++
}
END {
for (i in size)
printf("%d %d\n", 2^i, size[i])
}' | sort -n | awk -v COLS=$( tput cols ) -v PAD=13 -v MAX=0 '
function human(x) {
x[1]/=1024;
if (x[1]>=1024) { x[2]++; human(x) }
}
{
if($2>MAX)MAX=$2;
a[$1]=$2
}
END {
PROCINFO["sorted_in"] = "@ind_num_asc";
for (i in a){
h[1]=i;
h[2]=0;
human(h);
bar=sprintf("%-*s", a[i]/MAX*(COLS-PAD), "");
gsub(" ", "-", bar);
printf("%3d%s: %6d %s\n", h[1], \
substr("kMGTEPYZ",h[2]+1,1), a[i], bar)
}
}'
Sample output:
1k: 505 --------
2k: 45
4k: 4609 --------------------------------------------------------------------------
8k: 2177 ----------------------------------
16k: 325 -----
32k: 642 ----------
64k: 2262 ------------------------------------
128k: 2547 ----------------------------------------
256k: 977 ---------------
512k: 434 ------
1M: 550 --------
2M: 1076 -----------------
4M: 2028 --------------------------------
8M: 2362 -------------------------------------
16M: 1814 -----------------------------
32M: 989 ---------------
64M: 366 -----
128M: 86 -
256M: 16
512M: 1
Note this does require gawk and not just regular awk, and uses tput to collect the size of your terminal in order to scale the histogram.