127

Does anyone have any recommendation or procedures for repairing a corrupt PDF? When I open the file I get "There was an error opening this document. the file is damaged and cannot be repaired."

There seems to be a myriad of tools out there but none that I could describe as reputable. Are there any opensource linux based solutions for this possibly?

Robotnik
  • 2,645

6 Answers6

150

Ghostscript will repair your corrupted PDF automatically... if it can open it in the first place (that is, if it is not damaged beyond repair). But afterwards you'll still need to double-check the result...

On Linux, try this command:

 gs \
  -o repaired.pdf \
  -sDEVICE=pdfwrite \
  -dPDFSETTINGS=/prepress \
   corrupted.pdf

On Windows, try this one:

 gswin32c.exe ^
  -o repaired.pdf ^
  -sDEVICE=pdfwrite ^
  -dPDFSETTINGS=/prepress ^
   corrupted.pdf
Kurt Pfeifle
  • 13,079
65

I had a corrupted PDF file, print.pdf , that Ghostscript couldn't open, but the usual graphical Linux PDF viewers (Okular, Evince) opened fine. (In my case, the file had garbage at the start instead of a PDF header, when opened in a hex editor.)

These PDF viewers use Poppler as a back-end PDF renderer. So you can repair the PDF using Poppler's command-line tools. In Ubuntu these are in the poppler-utils package. I used:

pdftocairo -pdf print.pdf print_repaired.pdf

which generated a PDF file with correct headers, which tools like Ghostscript now accepted.

Mechanical snail
  • 7,963
  • 5
  • 48
  • 67
46

mutool (project page, manpage) will repair broken PDFs without printing them.

  • Installation e.g. on Ubuntu: sudo apt-get install mupdf-tools
  • Run it like this: mutool clean input.pdf output.pdf
mutool clean [options] input.pdf [output.pdf] [pages]

The clean command pretty prints and rewrites the syntax of a PDF file. It can be used to repair broken files, expand compressed streams, filter out a range of pages, etc. If no output file is specified, it will write the cleaned PDF to "out.pdf" in the current directory.

Alternatively, there are a few tools and frameworks that can decompose/decompile PDFs into their components without rendering them. These could be useful for extracting text, scripts, and images. See this answer for a list of such tools: https://reverseengineering.stackexchange.com/q/1526/8210. E.g. you can try the current top answer Origami, it has a GTK-based viewer.

jmiserez
  • 1,703
13

I had a corrupted pdf file, because the php file used to download it echoed some errors (in HTML) and NUL characters at the end.

The solution was to open the pdf with Notepad++ and remove all text after the line

%%EOF
Oriol
  • 1,509
1

Since Chrome, Chromium and Firefox can open PDFs and can also print to PDF, that may work if they can render it correctly. That can be used too for modifying the format, number of pages, etc.

LibreOffice can also read and write PDF

GIMP can also read and write PDF, although it's not the most practical application when dealing with multi-page documents

Generally speaking if any of your installed applications can open the corrupt PDF file and you have a "Print to PDF" printer installed, you are good to go

golimar
  • 1,904
0

There is Windows freeware tool PDF Fixer, which will run on Wine. I was able to get a preview of some content of a partially downloaded PDF, when the other tools mentioned here failed. But I was not able to combine it's output files to a valid PDF file (I had expected that it will produce one automatically, but that was not the case with my specific file).

Shakesbeer
  • 101
  • 2