122

I often have to gather log files and upload them to a central server (Owned by another company). The central server has a size limit of the file, so I am trying to create the smallest file possible that is still in the zip format.

What are the best setting to use when compressing a text file to a zip format when my only need is a small file size?

7zip Options

I've done the obvious and chosen ultra compression, and I have noticed that LZMA does a better job than deflate, but there are far too many other permutations of options for me to test them all.

jjnguy
  • 1,877

6 Answers6

110

To create the smallest standard ZIP file that 7-Zip can create, try:

7z a -mm=Deflate -mfb=258 -mpass=15 -r foo.zip C:\Path\To\Files\*

Source: How can I achieve the best, standard ZIP compression?

Otherwise if you don't care about the ZIP standard, use the following ultra settings:

7z a -t7z -m0=lzma -mx=9 -mfb=64 -md=32m -ms=on archive.7z dir1

Which are:

-t7z   7z archive

-m0=lzma
       lzma method

-mx=9  level of compression = 9 (Ultra)

-mfb=64
       number of fast bytes for LZMA = 64
-md=32m
       dictionary size = 32 megabytes

-ms=on solid archive = on
Tanja
  • 138
kenorb
  • 26,615
49

Beware: a commenter reported that this answer made 7z use 45GB of virtual memory when compressing. (They didn't specify how large an input they were compressing.)

Some LZMA settings like dictionary size require that much RAM even to decompress, so keep in mind what machines need to be able to decompress your files if you're sending it to someone else. Large dictionaries allow finding matches over longer distances, so often help compression, maybe a lot for some inputs with similar but large files in a solid archive.


After much experimentation, digging into the detailed 7zip documentation, and reading some of the 7z source code regarding the advanced LZMA2 parameters, here is a better method below. It reduced some 1GB real-world test files more than 2 to 4 times better than the previously accepted solutions posted here or even in the 7z manpage.

7z a -t7z -mx=9 -mfb=273 -ms -md=31 -myx=9 -mtm=- -mmt -mmtf -md=1536m -mmf=bt3 -mmc=10000 -mpb=0 -mlc=0 archive.7z inputfileordir

The LZMA2 compression is assumed here, but you might be able to get even better performance in 7zip with passing advanced LZMA2 options like -m0=LZMA2:27, or -m0=LZMA2:d25, or an array of parameters like

-m0=BCJ2 -m1=LZMA:d25 -m2=LZMA:d19 -m3=LZMA:d19 -mb0:1

Such parameters didn't seem to be respected by the 7z versions I tested, but you may want to explore further or patch the 7z code to properly parse them. Or maybe it is supposed to work and is just broken in the builds that were tested.

Peter Cordes
  • 6,345
91735472
  • 491
21

I have decided to do some experiments for empirically finding the optimal compression parameters.

The tool I have used was 7-ZIP finetuner. This tool hunts for the optimal parameters by simply repeating the compression with varying parameters looking for the optimal combination. A run for one file may sometimes take more than an hour even on a fast computer.

The parameters that it tries are:

LC : number of Literal Context bits
LP : number of Literal Pos bits
PB : number of Pos Bits
YX : level of file analysis
FB : number of Fast Bytes

I have left the default parameters of dictionary size as 512 MB and solid block size On. The tool uses the LZMA method.

The best combinations of parameters on several types of files were as follows:

enter image description here

I note that the best values were not constant even for files of the same type.

Conclusion: There are no best options, as each file may have its own unique best combination. One may drive all parameters up to their limits, but an improvement is not at all guaranteed.

The most common combination seems to be:

LC : 8
LP : 0
PB : 1
YX : 5
FB : 273

Some 7-Zip references:

harrymc
  • 498,455
19

If you can use .7z format rather than just .zip, I would simply use PPMD with the following options and leave everything else as set by the Compression Level:

  • Archive Format: 7z
  • Compression Method: PPMD
  • Compression Level: Ultra

I regularly compress server/text logs (60MB+) using these options and they usually come out at 1-2% of the original size.

8

I compare for db.fdb 1,2 GB (1236598784 B) in Ubuntu server 14.04.03 with p7zip [64] 9.20 on VM:

1. 7z a -mx=9 1.7z db.fdb
2. 7z a -t7z -m0=lzma -mx=9 -mfb=64 -md=32m -ms=on 2.7z db.fdb
3. 7z a -t7z -m0=lzma -mx=9 -mfb=258 -md=32m -ms=on 3.7z db.fdb
4. 7z a -t7z -m0=lzma -mx=9 -mfb=258 -md=32m -ms=on -pass=15 4.7z db.fdb
5. 7z a -mx=9 -mmt=on 5.7z db.fdb
6. 7z a -t7z -m0=lzma -mx=9 -mfb=258 -md=32m -ms=on -mmt=on 6.7z db.fdb

and have that results:

1.7z 96 MB (100108731 B) with 6' 25"
2.7z 95 MB ( 99520375 B) with 5' 18"
3.7z 93 MB ( 97512311 B) with 9' 19"
4.7z 93 MB ( 97512345 B) with 9' 40"
5.7z 96 MB (100108731 B) with 5' 26"
6.7z 93 MB ( 97512311 B) with 9' 09"

I think second method works fine = (almost) best compress with best time. But for best "view" and easy to remember is first method - with small files and no point of max compress. Between 2 and 3 method we don't get extra smaller 7z but pay almost twoo more time for compression. Anyone decide with his own.

SULIMa
  • 81
-1

Set the "split to volume, bytes" field to the server's maximum allowed file size (in bytes, I think, although it looks like it accepts common abbreviations like "KB" and "MB"). If the zip file exceeds that size, 7-zip will split it into multiple files automatically, such as integration_serviceLog.zip.001, integration_serviceLog.zip.002, etc. (Way back when, PK Zip used this to span zip files across multiple floppy disks.) You'll need all the files to be present to unzip them. Use that instead of worrying about the absolute best compression settings to use for any particular set of files, because what's best for one file may be different for another file, and you don't want to have to go through this every time you need to copy logs.