2

Is it possible to gunzip multiple files and concatenate them into one big file, but do it in parallel given a multicore machine? For example, right now I do:

gunzip -c file1.gz > final
gunzip -c file2.gz >> final
gunzip -c file3.gz >> final
gunzip -c file4.gz >> final

Can I do the same so that the gunzip processing of the different files is done in different CPUs in the multicore machine and they are concatenated all into the same final file?

719016
  • 4,683

2 Answers2

5

It is a little shorter to do that using GNU Parallel:

parallel gunzip -c ::: file*.gz > final

but essentially it also writes to temporary files.

Watch the introvideos to learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Ole Tange
  • 5,099
1

you have to use temporary files for this:

gunzip -c file1.gz > final &
one=$!
gunzip -c file2.gz > final2 &
two=$!
gunzip -c file3.gz > final3 &
three=$!
gunzip -c file4.gz > final4 &
four=$!

wait $one && wait $two && wait $three && wait $four
cat final2 >> final
cat final3 >> final
cat final4 >> final

to decompress the parts of a bigger file into the one final file you would have to know the decompressed size of the parts. only then you could create a biiiig empty file and write the output of the decompression to the right position in the big file (with dd for example). since you don't know the decompressed size (without decompressing the parts first) this won't work.

akira
  • 63,447