0

I have a very large .zip that is 174GB large and 800GB decompressed. I am trying to work with this using the R programming language, however, it is far too large for me to work with.

I have found that I can split the .zip file using my terminal in my Mac: How do I split a .zip file into multiple segments?

So I have dropped my .zip file into my documents folder, and placed this code into my terminal:

zip species.zip --out new.zip -s 3000m

To split it into 3GB per file, as it is easier to work with. However, the files made are not .zip format, they are just documents:

Image of the file

Then when using the R code to extract it:

> zi.fl <- zip_to_disk.frame2(zipfile = "new.zip", outdir = data_dir)  %>%
+   rbindlist.disk.frame() %>% filter(year > 2019)

I get the following error

Error: archive.cpp:24 archive_read_open_filename(): Unrecognized archive format

How can I get it to split into useable .zip files?

1 Answers1

1

When you use the Mac OS command split, it simply breaks the single file into binary chunks, without file type. For example, the Zip file header will only be in the first segment, and the header would be misleading, since it describes the contents of the entire file.

To reconstitute the original file, use cat to concatenate the binary segments back to a Zip file. However, from your question, it appears that would still be too large to work with.

If you need to work with smaller pieces that are truly Zip archives, then you'd need to split the original, 800 GB data* into separate pieces and then Zip each segment. Each piece would be a true Zip file, and could be extracted and then concatenated to yield the original data file.