3

I am trying to figure out a way to use parallel compression to zip using pigz, but couldn't find a way till now.

One option I have is to zip files/dirs with zip -r -0 and then again use pigz -K -f, but it creates a zip inside the zip.

I came across an upvoted answer here which seemed like a solution but had invalid zip-like syntax for pigz:

pigz -K -k archive.zip bigfile txt

I don't think pigz takes zip file name as an argument and we can then specify the files to zip.

Vijay
  • 31

1 Answers1

4

I don't think pigz takes zip file name as an argument and we can then specify the files to zip.

It seems you're right.


Can we use pigz with --zip to compress multiple files in a single zip-compatible format?

Probably not; or not yet (new features can be added in the future, although adding this particular feature may not be the Right Thing; keep reading). I have found no way to do this. You need to put your files into a single archive first, then compress.

There's a reason for this. According to the Unix philosophy programs should follow the "Do One Thing And Do It Well" rule. Putting one or more files (directory is also a file) is one thing and we call it "archiving". Reducing the size is another thing and we call it "compressing". We have archivers, the common one is tar, the POSIX one is pax; and we have compressors: gzip, compress, bzip2, lzma, …

Some compressors and compressed file formats support storing multiple files because their authors were apparently not enlightened by the Unix philosophy. :)

But it's not only a philosophical issue, there are practical advantages:

  • You can use any archiver with any compressor. In particular you can pick another (e.g. better) compressor and still use the archiver you are most familiar with (probably GNU tar). Tools that work as both tend to invent their own options and rules for the common task of archiving.
  • If filesystems introduce new features then we will need to upgrade our archivers only.
  • If you invent a new compression method then you will be able to develop a new compressor without paying attention to how to traverse directory trees, what metadata to read or which character should separate pathname components.

pigz is a compressor and it seems it has no ambition to be an archiver. With --zip/-K it uses the .zip format associated with a tool that is designed to be a compressor and an archiver. pigz doesn't have to use all the features of the format, in particular the ability to store more than one file. It could be "improved" but now you know why I think this wouldn't be the Right Thing.

Still archiving-and-compressing is a pretty common use case. A good archiver should be able to write to its stdout. A good compressor should be able to read from its stdin. Then you can use them in a pipeline. This is a general way.

Specifically with tar you can use a switch that makes the tool filter (pipe) the archive through a compressor: -z for gzip, --lzma for lzma etc. A universal switch is -I, it allows you to use a custom compressor. The compressor can be pigz --zip:

tar -cv -I 'pigz --zip' -f archive.tar.zip file1 file2 file3

The same compressor can be used to unpack, if only it supports -d (pigz does):

tar -xv -I 'pigz --zip' -f archive.tar.zip

Technically this archive.tar.zip is a zip file with a tar file inside, so it's similar to your "zip inside zip". If you unzip it then you will get a tar archive named -. The above tar commands work on the fly though (no intermediate file created).

This is how you do it in Linux/Unix.