I am attempting to inspect a PowerBI .pbix file using python's zipfile library.
When unzipping the .pbix archive, I get the following structure:
DataMashup
DataModel
DiagramLayout
Metadata
Report
ReporLayout
ReporStaticResources
ReporStaticResourceSharedResources
ReporStaticResourceSharedResourceBaseThemes
ReporStaticResourceSharedResourceBaseThemeCY18SU07.json
SecurityBindings
Settings
Version
[Content_Types].xml
It appears that the DataMashup file in the .pbix archive is some sort of off-brand archive of a directory.
The DataMashup object does not appear to be compressed, as I can easily read xml data when printing the object in the python interpreter.
Using 7zip I am able to access everything within:
DataMashup/
Config/
Package.xml
Formulas/
Section1.m # m and/or dax looking stuff
[Content_Types].xml
How can I discover the format of the DataMashup archive-within-an-archive?
One clue is in the binary data at the top of the DataMashup object: \x00\x00\x00\x00\x07\x05\x00\x00PK which may indicate pkzip.
Another clue may be this output when attempting to use unzip on the DataMashup file:
$ unzip DataMashup
Archive: DataMashup
warning [DataMashup]: 6215 extra bytes at beginning or within zipfile
I was able to uncompress the DataMashup directory on linux using 7za:
WARNINGS:
There are data after the end of archive
--
Path = DataMashup
Type = zip
WARNINGS:
There are data after the end of archive
Offset = 8
Physical Size = 1303
Tail Size = 5148
Everything is Ok
Archives with Warnings: 1
Warnings: 1
Files: 3
Size: 2040
Compressed: 6459
Despite the warnings, the files appear okay. Unfortunately, this does not help me on windows.