I have several directories containing thousands of gzip files (overall we are talking about 1M files). Some of these files are corrupted and most of them are really small in size (a couple of KB).
Almost all of them are highly similar in content, therefore compressing all of them together should improve the compression ratio wrt to the current situation.
Since I rarely browse these directories and I just need to have them around for archival reasons, I need a highly available and highly compressible format and create a single archive. It would be nice to have random access capability to access specific files once in a while without decompressing the whole archive.
What's the best strategy here? Is tar resilient to corruption? I'd prefer something that can be implemented as a one-liner or a simple bash script.