I have an .xls file which I want to pretty print in order to have nice diffing rather than just binary files being changed.
My approach is to unzip this entire thing. The resulting string does not contain linebreaks so I ran it through xmllint --format. But on this seemingly simple path I have encountered several issues which I have already spent hours on:
- unzipmultiple files inside the xml archive. This results in invalid xml. Even with- unzip -qoptions I get multiple DTDs and so on. xmllint breaks on this without formatting the input.- unzip -c -a -q myFile.xlsx | xmllint --format -
- I tried splitting the XML into an array using - readin order to feed each individual xml file to xmllint. In the result of- readmost array items seem to be empty and the third and fourth item contain 20something letters of the xml string.- IFS='\<\?xml' read -r -a files <<< "$decompressed"
- I also tried just inserting linebreaks with - sedbut the filesize is so large that processing takes too long for making it feasible for diffing.- ${decompressed/\>\</\>\n\</g}
I have just run out of ideas so I decided to consult you guys! Thanks ahead :)
 
     
    