How to remove OCR from a PDF?

Question

I have been searching Google for some time but cannot find an answer to my question.

I have unwanted layers of OCR in a document that I recently scanned with Adobe Acrobat. It has not been OCRed properly, and I want to redact some information, but the OCR is making the wanted information to get erased. I converted the files to TIFs, but noticed a (very) significant quality loss. I have heard that printing to another PDF either keeps the text or reduces the image quality.

score 12 · Answer 1 · edited Sep 22 '17 at 01:06

In Acrobat Pro DC, the appropriate command is "Remove Hidden Information," which is available through both the "Protect" and "Redact" tools.

On running the command, it just searches out the hidden information but does not change the document. You must then tell Acrobat which information to remove. In this case, select "Hidden Text" in the Results pane, then click the Remove button and save the changed document.

score 4 · Answer 2 · answered May 12 '20 at 12:12

Try the "MS Print to PDF" driver. It ships with all recent Windows versions. Make sure to check "Print As Image" under advanced settings to remove OCR.

The quality loss in printing to PDF is negligible. It does however keep the OCR by default unless you print as image.

score 3 · Answer 3 · edited Apr 01 '22 at 16:52

3

If, as you say, the documents are scanned and not printed to PDF from Word for example, you can easily remove with your Adobe:

Select Document, Examine Document and now you can remove the hidden text (OCR).

edited Apr 01 '22 at 16:52

Pertinax

103

answered Dec 10 '15 at 10:50

Fran

31

score 3 · Answer 4 · answered Oct 20 '16 at 15:55

3

In Acrobat Pro: use 'remove hidden information' (under 'protection'). Select all, execute, OCR is gone

answered Oct 20 '16 at 15:55

jazzzz

31
1

score 3 · Answer 5 · answered Jan 21 '21 at 11:22

Easy way to remove OCR layer from PDF: open PDF in Firefox and "print" into another PDF.

Note that "nice" PDF (e.g. created by MS Word) will become much larger (in my case, from 0.5 to 2 MB), and quality is reduced somewhat. Make sure you set correct paper size when "printing".

If you want to redo OCR instead of removing it completely, and you don't mind command line, use ocrmypdf:

ocrmypdf --redo-ocr --output-type=pdf input.pdf output.pdf

On Windows 10, the easiest way to setup and use ocrmypdf is via WSL.

Sanoo · Answer 6 · 2014-10-13T07:53:25.940

After a lot of experimenting, I found that printing to Adobe PDF from Adobe Acrobat prints the document without the OCR and without losing the quality (an unnoticeable at first glance resolution is lost).

However, many sites claim that this does not work. I also tried the other printers such as Foxit Reader and OneNote but the quality was reduced. JPEG too was the same.

Please keep in mind that your mileage may vary.

Note: I am leaving this thread marked as unanswered in hope of finding a better answer than mine.

score 1 · Answer 7 · edited Jan 30 '18 at 16:51

1

In Acrobat X, under Protection, there is a Sanitize Document button that removes EVERYTHING but what can be seen (including OCR'd text layer), converting the document to a flattened bit map.

edited Jan 30 '18 at 16:51

darthbith

589

answered Dec 14 '17 at 08:49

Dave

11

score 1 · Answer 8 · answered Mar 25 '20 at 17:51

1

I solved it by exporting to JPEG, then from JPEG 'combine files in acrobat'. This is from a doc that was originally a word doc and had been converted to PDF. OCR is gone.

answered Mar 25 '20 at 17:51

rando cal

11

score 1 · Answer 9 · edited Mar 31 '21 at 12:20

1

Use the PitStop Pro Acrobat Plug In, in the "Actions List ", create a new action, in the upper right, look for" Select text fragment "and" Remove selected object ", run scope: whole document as seen below:

edited Mar 31 '21 at 12:20

Charles Kenyon

5,364

answered Mar 29 '21 at 01:23

hrdom

77

levinology · Answer 10 · 2019-01-31T08:19:18.980

0

I built a tool to do this free PDF Redactor. If you upload the image and just click redact it'll flatten your pdf and remove OCR. If you want you can also draw redaction marks on the document as well.

edited Jan 31 '19 at 08:19

answered Jan 31 '19 at 07:31

levinology

111

score 0 · Answer 11 · answered Sep 29 '21 at 13:13

0

For Adobe X and above: Tools > Protection > Remove Hidden Information.
For Adobe 9 and below: Document > Examine Document.

Reference: https://answers.acrobatusers.com/undo-recognize-text-q28083.aspx

answered Sep 29 '21 at 13:13

Géry Ogam

133

How to remove OCR from a PDF?

11 Answers11