17

I have a pdf file with some text on each page which I would like to remove.

The text is matched by a regex and I think it comes in one block of the pdf.

I have used pdfedit to select and delete the text with the GUI but I was looking for a way to do this from the terminal.

DrYap
  • 271

8 Answers8

13

You can try pdftk, but it works only a fraction of the time, due to (I believe) a problem with fonts.

It works like this: first you need to uncompress the pdf file,

  pdftk myfile.pdf output unc.pdf uncompress

then you modify it with

  sed 's/oldstring/newstring/g' < unc.pdf > mod_unc.pdf

lastly you recompress it with

 pdftk mod_unc.pdf output myfile_modified.pdf compress

I have had only moderate success with this command, in the sense that sometimes it works, sometimes it doesn't, according to its whim.

MariusMatutiae
  • 48,517
  • 12
  • 86
  • 136
1

On Windows (maybe a virtual machine) you could install PDF-XChange Editor https://www.tracker-software.com/product/downloads/enduser/pdf-xchange-editor

In the free-version can remove text (but not add text) without adding a watermark (of the software, even the software tells you so).

I had to remove several texts, therefore sed was too timeconsuming/exhausting, and sed did not work with umlauts.

Source: https://de.wikipedia.org/wiki/Benutzer:JoKalliauer/PDF

1

Copying my answer from a similar but closed question on the main SE site:

changepagestring will do this in a single step, as easy as:

changepagestring -o -v infile.pdf search-regex replace-str outfile.pdf

Finding the right regex can be tricky and even then it may not work with all PDFs, but it's the best option I've found so far.

Brian Z
  • 1,168
1

inkscape 1.2 added support for ,(import/export) multi page PDFs coupled with its good pdf object(?) support it did the job

yoshco
  • 504
0

To command line remove existing text in a PDF you MUST attempt to replace characters with nulls or spaces, so as to keep text XY placements in their self contained lines (there is generally no wraps in a PDF).

If the number of bytes is changed the file will most likely be compromised as it depends on an index of byte offsets. In addition many texts are compressed so not easy to find as plain text.

For all the above reasons, decompression of fonts, is needed with an application that can "fix" the number of bytes after edit.

Thus qpdf with QDF mode and QDF fix is the frequently mentioned "go to" answer.

For adding text, there are other problems. Like the already included font characters may not have every sub-set of characters in its fonts. Thus the new text will need to ensure it is using its own supply of fonts. This is most easily done by writing new text on a blank page in the desired XY location and then OVERSTAMPING the original page with the new font inclusive characters.

This too is possible with qpdf AFTER the existing text is replaced with blank space characters.

If the qpdf method does not work well you would need to move up to a fully command driven editor such as PyMuPDF or Java based scripted editors like Apache PDFBox.

K J
  • 1,248
-1

LibreOffice Writer can import PDFs. You can use it to edit the imported PDF and then export it as PDF. I am not 100 % sure about the accuracy, but it has worked well for me the one time I tested it.

nijoakim
  • 140
-2
pdftops in.pdf - | sed 's/WATERMARK//' | ps2pdf - out.pdf
Stofke
  • 182
  • 1
  • 6
-6

you can use any PDF editors. Nitro PDF is a good tool to edit PDF. There are also so many free tools. You can add or remove text using this.

http://www.nitropdf.com/free-pdf-software

PDFEdit is a good option for linux. read this link to know how to install. cyberciti.biz/tips/open-source-linux-pdf-writer.html