How to remove watermark from pdf using pdftk?

Question

I need to remove some stupid email watermark that expands across all pages of a public domain book. I looked at pdftk man page and some examples but still can not figure out how to remove the watermarks. I appreciate your hints.

score 77 · Answer 1 · edited Feb 07 '14 at 20:24

Just a little add-on to Dingo's answer as it did not work for me:

I had to first uncompress the PDF document in order to be able to find the watermark and replace it with sed. The first step involves uncompressing the PDF document using pdftk:

pdftk original.pdf output uncompressed.pdf uncompress

now, the uncompressed.pdf can be used as in Dingo's answer:

sed -e "s/watermarktextstring/ /" uncompressed.pdf > unwatermarked.pdf

I then repaired and recompressed the document:

pdftk unwatermarked.pdf output fixed.pdf compress

score 47 · Accepted Answer · answered Jul 12 '12 at 13:56

very simply task to perform:

use sed:

 sed -e "s/watermarktextstring/ /g" <input.pdf >unwatermarked.pdf

but, after, be sure to repair resulting output pdf

pdftk unwatermarked.pdf output fixed.pdf && mv fixed.pdf unwatermarked.pdf

all into one command:

 sed -e "s/watermarktextstring/ /g" <input.pdf >unwatermarked.pdf && pdftk unwatermarked.pdf output fixed.pdf && mv fixed.pdf unwatermarked.pdf

text watermarks are nothing else than a text between two tags inside the pdf compressed code

score 5 · Answer 3 · answered Dec 29 '22 at 15:19

Another add-on to Philippe's add-on to Dingo's answer...

The watermark I needed to remove was a stream object (which is a multi-line block of code), not a single line, so a single line sed command wasn't going to work for me.

I needed to use a text editor to find and remove it.

I first used Philippe's solution to uncompress the PDF.

Then opening the uncompressed.pdf in my favourite text editor, I found a block of text more than 50 lines long which I could see was obviously the code for the watermark.

The watermark was included in the document as a PDF stream object. ** (see below)

The lines defining the stream object that I need to remove started with a line containing only:

<num> 0 obj

where <num> was a number at the start of the line identifying the specific object.

I needed to delete this line and everything from it down to and including the first instance of

endstream
endobj

that followed that obj line. i.e. the whole stream object definition.

The endobj line was followed by the next <num> 0 obj 2 more lines down.

It was easy for me to see which stream object was the watermark code because it kindly included the word "Watermark" :-)

Yours may well not have such helpful text, but if you are patient:

Back up your original uncompressed PDF
Make a temporary copy of the uncompressed PDF
Find and remove an object stream from the temporary copy
Save your changes to the temporary copy.
Open the temporary copy in a PDF viewer
Check if the object stream you just removed was the watermark

If it wasn't, go back to step 2, rinse and repeat removing a different object each time until you've removed the watermark object.

** I learned about stream objects, including seeing examples by searching for "PDF object stream" on the web. https://blog.didierstevens.com/2008/05/19/pdf-stream-objects/ has a great summary, and "Chapter 1. PDF Syntax" of "Developing with PDF" by Leonard Rosenthol which is available to view on O'Reilly's website goes into more detail.

score -3 · Answer 4 · edited Nov 13 '20 at 21:49

-3

To remove watermark from pdf

open the PDF in notepad++ or textpad
search for desired watermark text and use 'find and replace' option to replace it with nothing (blank)
save the file
Open in standard adobe reader

Will throw error like - "file damaged,repair needed"

Exit, you will be prompted to save the file

save it

edited Nov 13 '20 at 21:49

Madhubala

2,008

answered Jan 24 '16 at 11:54

user549273

9

How to remove watermark from pdf using pdftk?

4 Answers4