5

I have a ZIM file and would like to have a look at its content. In particular count the number of articles and images, and maybe browse images.

How to do that? Preferably on Linux but instructions for other systems are OK too.

I don’t want to count images by opening the ZIM in Kiwix and browsing all thousands of pages and counting manually.

It is not uncompressible by XZ nor ZIP:

$ unxz wikivoyage_en_all_2015-09.zim
unxz: wikivoyage_en_all_2015-09.zim: File format not recognized

$ zipinfo wikivoyage_en_all_2015-09.zim
Archive:  wikivoyage_en_all_2015-09.zim
[wikivoyage_en_all_2015-09.zim]
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
zipinfo:  cannot find zipfile directory in one of wikivoyage_en_all_2015-09.zim or
          wikivoyage_en_all_2015-09.zim.zip, and cannot find wikivoyage_en_all_2015-09.zim.ZIP, period.
Nicolas Raoul
  • 11,561

2 Answers2

3

I tried following the answer of @Nicolas Raoul above on a mac but had issues building zimdump from source and couldn't find any binaries.

After some digging I found from a zimdump was a binary in alpine so the easiest approach for me was to run zimdump in docker.

Create a text file called Dockerfile with this content:

FROM alpine:edge

add the repository that contains libzim and zim-tools

RUN echo "http://dl-cdn.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories

Installing dependencies

RUN apk update && apk add libzim zim-tools

Then run:

docker build - < Dockerfile

Build docker image:

docker build -t zimdump .

Jump through hoops to mount and run zimpdump:

docker run -v $(pwd):/app -w /app -it zimdump zimdump -D . file.zim

(someone suggests that docker run -v $(pwd):/app -w /app -it zimdump dump --dir=. file.zim works better)

Nicolas Raoul
  • 11,561
Att Righ
  • 910
  • 1
  • 9
  • 18
2

The easiest way is to use the zimdump command, part of Zimlib.

Sample output:

zimdump -F wikivoyage_en_all_2015-09.zim
count-articles: 84897
uuid: 9213375a-53f4-819c-47ed-41fc87e7028f
article count: 84897
mime list pos: 80
url ptr pos: 193
title idx pos: 679369
cluster count: 40711
cluster ptr pos: 5169080
checksum pos: 468245393
checksum: 05b9bbf3b6d0c955b6ee74a3f929d911
main page: 44192
layout page: -

Not sure what these all mean but at least article count is available.

The -D option dumps everything in a directory.

zimdump -D name_of_dir file.zim

zx485
  • 2,337
Nicolas Raoul
  • 11,561