Why are tar and gzip almost always used together, and not just gzip? Is there any advantage to that method?
5 Answers
TAR creates a single archived file out of many files, but does not compress them.
Format Details
A tar file is the concatenation of one or more files. Each file is preceded by a 512-byte header record. The file data is written unaltered except that its length is rounded up to a multiple of 512 bytes and the extra space is zero filled. The end of an archive is marked by at least two consecutive zero-filled records.
GZIP compresses a single file into another single file, but does not create archives.
File Format
...Although its file format also allows for multiple such streams to be concatenated (zipped files are simply decompressed concatenated as if they were originally one file), gzip is normally used to compress just single files.[4] Compressed archives are typically created by assembling collections of files into a single tar archive, and then compressing that archive with gzip.
- 3,912
Gzip / Bzip2 are stream compressors. They compress a stream of data into something smaller. They could be used on individual files, but not on groups of files on their own.
Tar on the other hand has the ability to turn a list of files, with paths, permissions and ownership information, into a single continuous stream - and vice versa.
That's why, to archive files (and if one needs compression as well), one usually uses tar + some compression method.
- 431
Tar is in charge of doing one and only one thing well: (un)archiving into(out of) a single archive file. Of what? Of one and only one thing: a set of files.
Gzip is in charge of doing one and only one thing well: (un)compressing. Of what? Of one thing and one thing only: a single file of any type... and that includes a file created with tar.
It goes back to the UNIX philosophy of pipelining, the underlying "pipe and filters" architecture ; the treatment of everything as a file and the sound architectural goal of "one-thing-does-one-thing-only-and-does-it-well" (which results in a very elegant and simple plug-n-play of sorts.)
In its simplicity, it is almost algebraic in nature (a hefty goal in systems design). And that is no easy feat.
In many ways (and not without its flaws), this is almost a pinnacle in composability, modularity, loose coupling and high cohesion. If you understand these four (and I mean really understand), you understand, it will be obvious why tar and gzip work like that in pairs.
- 5,443
- 303
First of all, TAR wasn't created to create file archives. It's Tape ARchiver. It's job is to write out or load in an archive to/from tape.
The -f option makes it use a file as "virtual tape", which can then be compressed by another program. In fact, such compression happens on real-world tapedrives as well.
Of course, the philosophy of using one program to do it well also counts in this case, but one might miss why TAR archives are structured as a stream instead of directory of contents + contents.
- 230
Traditionally, Unix systems used one program to perform one task per the Unix philosophy: tar was just a means to package multiple files into a single file, originally for tape backup (hence tar, tape archive). tar does not provide compression; the resulting uncompressed archive is typically compressed with some other program such as gzip, bzip2, or xz. In the old days, they'd use the compress command to do this; newer compression algorithms are much more effective than this.
The highly modularized approach dictated by the Unix philosophy means that each program can be used individually as appropriate, or combined to perform more complex tasks, including the creation of compressed archives as described here. For these sorts of tasks, it also makes it easy to swap out individual tools as needed; you'd just change the compression program to use a different compression algorithm, without having to replace the tar utility itself.
This modular approach is not without its disadvantages. As mentioned in comments to other answers, a dedicated compressed archive format like .zip is better able to handle extraction of individual files; compressed tarballs need to be decompressed almost in their entirety in order to extract files near the end of the archive, while .zip archives allow random access to their contents. (Some newer formats, such as .7z, support solid and non-solid archives, as well as solid blocks of varying size in larger archives.) The continuing use of tar in conjunction with a separate compression utility is a matter of tradition and compatibility; also, .7z and .zip do not support Unix filesystem metadata such as permissions.