0

I want to be able to find text on the following awesome image and where the text is located in it. This is not the first time I stumbled upon such a challenge and probably many people have similar questions from time to time as well. I think this stands as a good instance of the general issue.

There are many ways (1, 2, 3, 4 ...) to achieve this in a customized fashion but is there any OCR out there able to automagically transform this into a texted PDF? Or any other format for that mater. I'm citing PDF just because it is the format made to handle exactly this kind of stuff.

Looks like Cuneiform would be a perfect tool to do it manually, but I couldn't compile it on the mac and I wouldn't bother doing it for this instance, but I bet some relentless intern could use it to complement the OCR which would never be able to identify the images in it.

Here's a reduced sample of the image so we can better picture the relevance of the question:

awesome image

karel
  • 13,706
cregox
  • 5,944

2 Answers2

1

Some almost-solutions:

Google Docs will OCR but not insert the text as a layer (I've not tried it out myself) http://googledocs.blogspot.com/2010/06/optical-character-recognition-ocr-in.html

Abbyy Finereader (http://www.abbyy.com/) will make it into a PDF with underlying text (I currently use it for this purpose), however it does cost (you could grab the trial version) and is Windows only

Evernote can OCR within images however I don't know if it will export to a PDF with underlying text

You could download a trial version of Acrobat Pro and use Document > OCR Text Recognition > Recognise Text Using OCR. I do have Acrobat Pro at work, but never used the OCR function so don't know how good it is. Again, Windows only (but you might well have Windows on a VM/Parallels/Virtualbox/Bootcamp)

1

The image you link has such a low resolution that most OCR software will have a hard time with it, and you may end up with a pretty bad solution. Honestly, if you started transcribing it yourself now you'd probably save time over finding a few packages, trying them out, trying to get them to work, and ultimately having to fix every other entry due to misreads.

Adam Davis
  • 4,405