I have hundreds of similar big files (30 megabyte each) which I want to compress. Every pair of files have 99% of same data (less then 1% difference), so I expect to have not more than 40-50 megabyte archive.
Single file can be compressed from 30 MB to 13-15 MB (with xz -1, gz -1, bzip2 -1), but when compressing two or more files I want to have archive with size 13-15MB + N*0.3MB where N is number of files.
When using tar (to create solid archive) and xz -6 (to define compression dictionary to be bigger than one file - Update - this was not enough!), I still have archive with size N*13MB.
I think that both gzip and bzip2 will not help me because they have dictionary less than 1 MB, and my tar stream has repetitions every 30 MB.
How can I archive the my problem in modern Linux using standard tools?
Is it possible to tune xz to compress fast, but use dictionary bigger than 30-60 MB?
Update: Did the trick with tar c input_directory | xz --lzma2=dict=128M,mode=fast,mf=hc4 --memory=2G > compressed.tar.xz. Not sure about necessary of mf=hc4 and --memory=2G options; but dict=128M set the dictionary to be big enough (bigger than one file), and mode=fast make the process bit faster than -e.