23

I have been searching Google for some time but cannot find an answer to my question.

I have unwanted layers of OCR in a document that I recently scanned with Adobe Acrobat. It has not been OCRed properly, and I want to redact some information, but the OCR is making the wanted information to get erased. I converted the files to TIFs, but noticed a (very) significant quality loss. I have heard that printing to another PDF either keeps the text or reduces the image quality.

Dave M
  • 13,250
Sanoo
  • 565

11 Answers11

12

In Acrobat Pro DC, the appropriate command is "Remove Hidden Information," which is available through both the "Protect" and "Redact" tools.

On running the command, it just searches out the hidden information but does not change the document. You must then tell Acrobat which information to remove. In this case, select "Hidden Text" in the Results pane, then click the Remove button and save the changed document.

Warren Young
  • 3,835
4

Try the "MS Print to PDF" driver. It ships with all recent Windows versions. Make sure to check "Print As Image" under advanced settings to remove OCR.

The quality loss in printing to PDF is negligible. It does however keep the OCR by default unless you print as image.

enter image description here

toster-cx
  • 249
3

If, as you say, the documents are scanned and not printed to PDF from Word for example, you can easily remove with your Adobe:

Select Document, Examine Document and now you can remove the hidden text (OCR).

Pertinax
  • 103
Fran
  • 31
3

In Acrobat Pro: use 'remove hidden information' (under 'protection'). Select all, execute, OCR is gone

jazzzz
  • 31
  • 1
3

Easy way to remove OCR layer from PDF: open PDF in Firefox and "print" into another PDF.

Note that "nice" PDF (e.g. created by MS Word) will become much larger (in my case, from 0.5 to 2 MB), and quality is reduced somewhat. Make sure you set correct paper size when "printing".

If you want to redo OCR instead of removing it completely, and you don't mind command line, use ocrmypdf:

ocrmypdf --redo-ocr --output-type=pdf input.pdf output.pdf

On Windows 10, the easiest way to setup and use ocrmypdf is via WSL.

3

After a lot of experimenting, I found that printing to Adobe PDF from Adobe Acrobat prints the document without the OCR and without losing the quality (an unnoticeable at first glance resolution is lost).

However, many sites claim that this does not work. I also tried the other printers such as Foxit Reader and OneNote but the quality was reduced. JPEG too was the same.

Please keep in mind that your mileage may vary.

Note: I am leaving this thread marked as unanswered in hope of finding a better answer than mine.

Sanoo
  • 565
1

In Acrobat X, under Protection, there is a Sanitize Document button that removes EVERYTHING but what can be seen (including OCR'd text layer), converting the document to a flattened bit map.

darthbith
  • 589
Dave
  • 11
1

I solved it by exporting to JPEG, then from JPEG 'combine files in acrobat'. This is from a doc that was originally a word doc and had been converted to PDF. OCR is gone.

1

Use the PitStop Pro Acrobat Plug In, in the "Actions List ", create a new action, in the upper right, look for" Select text fragment "and" Remove selected object ", run scope: whole document as seen below:

screenshot

hrdom
  • 77
0

I built a tool to do this free PDF Redactor. If you upload the image and just click redact it'll flatten your pdf and remove OCR. If you want you can also draw redaction marks on the document as well.

0
  • For Adobe X and above: Tools > Protection > Remove Hidden Information.
  • For Adobe 9 and below: Document > Examine Document.

Reference: https://answers.acrobatusers.com/undo-recognize-text-q28083.aspx