I'm downloading a zip file with axios. For further processing, I need to get the "raw" data that has been downloaded. As far as I can see, in Javascript there are two types for this: Blobs and Arraybuffers. Both can be specified as responseType in the request options.
In a next step, the zip file needs to be uncompressed. I've tried two libraries for this: js-zip and adm-zip. Both want the data to be an ArrayBuffer. So far so good, I can convert the blob to a buffer. And after this conversion adm-zip always happily extracts the zip file. However, js-zip complains about a corrupted file, unless the zip has been downloaded with 'arraybuffer' as the axios responseType. js-zip does not work on a buffer that has been taken from a blob.
This was very confusing to me. I thought both ArrayBuffer and Blob are essentially just views on the underlying memory. There might be a difference in performance between downloading something as a blob vs buffer. But the resulting data should be the same, right ?
Well, I decided to experiment and found this:
If you specify responseType: 'blob', axios converts the response.data to a string. Let's say you hash this string and get hashcode A. Then you convert it to a buffer. For this conversion, you need to specify an encoding. Depending on the encoding, you will get a variety of new hashes, let's call them B1, B2, B3, ... When specifying 'utf8' as the encoding, I get back to the original hash A.
So I guess when downloading data as a 'blob', axios implicitly converts it to a string encoded with utf8. This seems very reasonable.
Now you specify responseType: 'arraybuffer'. Axios provides you with a buffer as response.data. Hash the buffer and you get a hashcode C. This code does not correspond to any code in A, B1, B2, ...
So when downloading data as an 'arraybuffer', you get entirely different data?
It now makes sense to me that the unzipping library js-zip complains if the data is downloaded as a 'blob'. It probably actually is corrupted somehow. But then how is adm-zip able to extract it? And I checked the extracted data, it is correct. This might only be the case for this specific zip archive, but nevertheless surprises me.
Here is the sample code I used for my experiments:
//typescript import syntax, this is executed in nodejs
import axios from 'axios';
import * as crypto from 'crypto';
axios.get(
    "http://localhost:5000/folder.zip", //hosted with serve
    { responseType: 'blob' }) // replace this with 'arraybuffer' and response.data will be a buffer
    .then((response) => {
        console.log(typeof (response.data));
        // first hash the response itself
        console.log(crypto.createHash('md5').update(response.data).digest('hex'));
        // then convert to a buffer and hash again
        // replace 'binary' with any valid encoding name
        let buffer = Buffer.from(response.data, 'binary');
        console.log(crypto.createHash('md5').update(buffer).digest('hex'));
        //...
What creates the difference here, and how do I get the 'true' downloaded data?
 
     
    