How to do OCR on a PDF document?

Question

Possible Duplicate:
How to extract text with OCR from a PDF on Linux?

I have a few documents in English and Hebrew that I scanned in and converted to PDF format.

Is there some free or cheap utility that can process a scanned PDF and do OCR, at least in English, preferably also in Hebrew?

Thanks!

score 1 · Answer 1 · answered Feb 16 '10 at 16:47

1

I found an interesting idea that lets Google do all the work of OCR'ing the PDF files for you.

answered Feb 16 '10 at 16:47

eleven81

16,182

score 1 · Accepted Answer · answered Feb 16 '10 at 16:54

I found a list of free OCR software for Windows.

However, these programs need an image input, not a PDF input. For this, try a PDF-to-JPG converter.

score 0 · Answer 3 · answered Feb 16 '10 at 16:47

Personally, I would use Ghostview to convert them to an image, then Tesseract to convert them to text. This is a totally free, open source, cross platform solution that I have had very good results with when trying to convert plain text. I don't use it for complex documents with tables and such, but for plain text you can't beat the price.

How to do OCR on a PDF document?

3 Answers3

Linked

Related