0

I have files for a mod for a game. This mod requires some music files to be present twice in different folders. Given that the music is the same in both folders, is there a way to zip the files once, then change the table of content and reference to the other copy, such that if I extract the zip file, it extracts the files twice, but they are actually just once in the zip?

Similar as to creating an ISO with a modified TOC (though I don't know how to do that either)

An example of what the zip would have:

mod.zip
\music\set_a\tune1.mp3
\music\set_a\tune2.mp3
\music\set_a\tune3.mp3
\music\set_a\tune4.mp3
\music\set_a\tune5.mp3
\music\set_a\tune6.mp3
\music\set_b\tune1.mp3
\music\set_b\tune2.mp3
\music\set_b\tune3.mp3
\music\set_b\tune4.mp3
\music\set_b\tune5.mp3
\music\set_b\tune6.mp3
\graphics\set_a\img1.png
\graphics\set_a\img2.png
\graphics\set_b\img1.png
\graphics\set_b\img2.png

Imagine that the tunes for set_a and set_b are identical, the graphics for set_a and set_b are not.

In an ideal world, I would replace all mp3 files in set_b with 0 length files, then after I created the zip file, I would alter the index and make it refer to the set_a data, so that upon extracting, it creates music\set_b\tune1.mp3, but uses the data of music\set_a.

Is that possible? If not, any other easy way to create something similar?

Attie
  • 20,734
LPChip
  • 66,193

2 Answers2

3

Probably a simple alternative is to use a "solid" archive format. This is always how .tar.foo archives work, and is a selectable option for .rar, and .7z formats.

In this mode, the archive's contents are concatenated together and compressed as a single continuous stream, meaning that repetitions will be detected across files as well – and identical files should get deduplicated as part of the regular compression.

(The downsides of this mode are that it makes extracting individual files slow and the archive cannot be updated easily.)

Note: This other thread (which was closed) has answers saying that this only works with relatively small amounts of data compared to dictionary size parameter. But at least it's less risky than making nonstandard changes to the already-horrible .zip structure.

grawity
  • 501,077
2

zpaq does this for you, it has built in deduplication, it is open source and it runs at least on Windows and Linux (probably already packed).

This is a quick check on Linux:

$ dd if=/dev/urandom bs=1M of=file1 count=10
$ cp file1 file2
$ zpaq add archive.zpaq file1 file2
$ ls -lh archive.zpaq 

See the size of the archive. Note also that we did not provide any information about the duplication of the files, no soft/hard links.

$ rm file1 file2
$ zpaq extract archive.zpaq 
$ ls -lh file1 file2