50

I need to remove some stupid email watermark that expands across all pages of a public domain book. I looked at pdftk man page and some examples but still can not figure out how to remove the watermarks. I appreciate your hints.

hnns
  • 1,115

4 Answers4

77

Just a little add-on to Dingo's answer as it did not work for me:

I had to first uncompress the PDF document in order to be able to find the watermark and replace it with sed. The first step involves uncompressing the PDF document using pdftk:

pdftk original.pdf output uncompressed.pdf uncompress 

now, the uncompressed.pdf can be used as in Dingo's answer:

sed -e "s/watermarktextstring/ /" uncompressed.pdf > unwatermarked.pdf

I then repaired and recompressed the document:

pdftk unwatermarked.pdf output fixed.pdf compress
Philippe
  • 957
47

very simply task to perform:

use sed:

 sed -e "s/watermarktextstring/ /g" <input.pdf >unwatermarked.pdf

but, after, be sure to repair resulting output pdf

pdftk unwatermarked.pdf output fixed.pdf && mv fixed.pdf unwatermarked.pdf

all into one command:

 sed -e "s/watermarktextstring/ /g" <input.pdf >unwatermarked.pdf && pdftk unwatermarked.pdf output fixed.pdf && mv fixed.pdf unwatermarked.pdf

text watermarks are nothing else than a text between two tags inside the pdf compressed code

Dingo
  • 1,024
5

Another add-on to Philippe's add-on to Dingo's answer...

The watermark I needed to remove was a stream object (which is a multi-line block of code), not a single line, so a single line sed command wasn't going to work for me.

I needed to use a text editor to find and remove it.

I first used Philippe's solution to uncompress the PDF.

Then opening the uncompressed.pdf in my favourite text editor, I found a block of text more than 50 lines long which I could see was obviously the code for the watermark.

The watermark was included in the document as a PDF stream object. ** (see below)

The lines defining the stream object that I need to remove started with a line containing only:

<num> 0 obj

where <num> was a number at the start of the line identifying the specific object.

I needed to delete this line and everything from it down to and including the first instance of

endstream
endobj

that followed that obj line. i.e. the whole stream object definition.

The endobj line was followed by the next <num> 0 obj 2 more lines down.

It was easy for me to see which stream object was the watermark code because it kindly included the word "Watermark" :-)

Yours may well not have such helpful text, but if you are patient:

  1. Back up your original uncompressed PDF

  2. Make a temporary copy of the uncompressed PDF

  3. Find and remove an object stream from the temporary copy

  4. Save your changes to the temporary copy.

  5. Open the temporary copy in a PDF viewer

  6. Check if the object stream you just removed was the watermark

If it wasn't, go back to step 2, rinse and repeat removing a different object each time until you've removed the watermark object.

** I learned about stream objects, including seeing examples by searching for "PDF object stream" on the web. https://blog.didierstevens.com/2008/05/19/pdf-stream-objects/ has a great summary, and "Chapter 1. PDF Syntax" of "Developing with PDF" by Leonard Rosenthol which is available to view on O'Reilly's website goes into more detail.

JohnGH
  • 161
-3

To remove watermark from pdf

  1. open the PDF in notepad++ or textpad
  2. search for desired watermark text and use 'find and replace' option to replace it with nothing (blank)
  3. save the file
  4. Open in standard adobe reader

Will throw error like - "file damaged,repair needed"

  1. Exit, you will be prompted to save the file

save it

Madhubala
  • 2,008