6

Possible Duplicate:
How to extract text with OCR from a PDF on Linux?

I have a few documents in English and Hebrew that I scanned in and converted to PDF format.

Is there some free or cheap utility that can process a scanned PDF and do OCR, at least in English, preferably also in Hebrew?

Thanks!

Shaul Behr
  • 1,485

3 Answers3

1

I found an interesting idea that lets Google do all the work of OCR'ing the PDF files for you.

eleven81
  • 16,182
1

I found a list of free OCR software for Windows.

  1. FreeOCR
  2. Tesseract
  3. WeOcr Tesseract Web Interface
  4. GOCR
  5. Windows GUI for GOCR
  6. OCR Desktop
  7. Simple OCR
  8. TopOCR

However, these programs need an image input, not a PDF input. For this, try a PDF-to-JPG converter.

eleven81
  • 16,182
0

Personally, I would use Ghostview to convert them to an image, then Tesseract to convert them to text. This is a totally free, open source, cross platform solution that I have had very good results with when trying to convert plain text. I don't use it for complex documents with tables and such, but for plain text you can't beat the price.

Dennis
  • 6,696