Gzip thousands of files efficiently in a single pass

Question

I want to gzip the contents of a couple of thousand of tiny files into a single file. While I could do it with something like for file in $(find . -iname 'pattern'); do; cat $file | gzip - >> zipped.gz; done;, this achieves pretty bad compression on the first go. While rezipping it is quite easy with zcat zipped.gz | gzip --best > rezipped.gz I'd like to know if someone knows a nice way to do this in a single pass.

Pol Van Aubel · Accepted Answer · 2014-09-30T14:10:28.990

When it turned out that best behaviour is when you cat all files into a single stream, I was figuring things out with a loop. But then I realized that there's an even easier (and better) way:

find . -iname 'pattern' -exec cat {} \; | gzip --best - > file.gz

All the invocations of cat will go to the stdout of find, there's only one invocation of gzip. Rezipping the result yields the same file size. You may be able to get an even better result (in terms of no. of invocations of cat, not in file size) if you use the + version of -exec (see the find man page) but I haven't tested that.

Vincent · Answer 2 · 2014-09-26T14:41:38.730

find . -iname 'pattern' | xargs gzip -9 -v

EDIT

It seems that when you cat the file gzip is better able to zip it.

This may work:

for TXT in $(find /PATH/TO/TXT/FILES -iname '*.txt'); do cat ${TXT} | gzip -9 > ${TXT}.gz; done

On my mac, the original text file was not removed. Thus, both the original text file and the zipped file were present after running the script.

You could easily add

rm -f /PAHT/TO/TXT/FILES/${TXT}

to the loop to get rid of the plain text files.

Gzip thousands of files efficiently in a single pass

2 Answers2