scan A4 doc > pdf > ocr > translate to english?

Question

I've tried using a combination of

my home scanner to create a '300 dpi', 'document', 'pdf' (options on Canon all-in-one)
ZoHoViewer to create either an RTF or TXT file
google docs to translate

I'm not sure how good or bad a product ZoHoViewer is, but the following:

Als Arbeitsmarkbehörde haben wir den gesetzlichen Auftrag, die Vermittelbarkeit von

turns into:

AlsArbeitsmarktbeh6rde habenwirdengesetzlichenAuftrag,dieVermittelbarkeit vonSt...

consequently, goog docs makes a pig's breakfast of trying to translate it.

Does anyone have any better suggestions (preferably free online services)

score 5 · Answer 1 · edited Mar 20 '17 at 10:17

There have been several other questions on SuperUser on OCR, which might be worth checking out for possible solutions.

Most notably this answer by Molly looks promising:

I really like TopOCR, certainly a great addition to your scan tools:

Incredible OCR accuracy, upto 99.8% with a 3 MP camera

No page limits, and no extra downloads or components needed

Handles images with mixed text and graphics (Manual or Auto Zoning)

Tolerates skew and uneven lighting

Multiple text output formats, including searchable PDF and HTML

Able to read 11 different languages

Powerful, easy to use Image Processing with Image Dewarping

Supports Smartphones: See some Smartphone samples

Includes built-in, full featured Text and Image WYSIWYG Editors

Post-processing spell checker for all 11 languages

Built-in Text-To-Speech software. How about OCR to MP3?

Includes a built-in multi-lingual text translater

Supports a Command Line Interface and a GUI

Make a high performance document Search and Indexing system

Browser Helper Mode supports creating free audio eBooks

With TopOCR's Web Engine it's easy to add new features

it's very accurate and works excellent with low quality images such as photographs of pages/documents

TopOCR is freeware (can be made portable with Universal Extractor)

Further reading:

Which OCR software has the most options?

Practical OCR solution for converting a large book to a digital format?

How to extract text with OCR from a PDF on Linux?

score 4 · Answer 2 · answered Jan 18 '10 at 20:38

Given that the OCR has converted:

Als Arbeitsmarkbehörde ...

to:

AlsArbeitsmarktbeh6rde ...

A couple of things spring to mind.

Try scanning at a higher dpi. It looks like it can't recognise the space between the words, a higher dpi might improve that.
Can you set the language of your OCR program? I see that it's converted the "ö" to a "6". While this might be a problem caused by the resolution it might also be that as "ö" isn't an everyday part of English, the program is choosing the "next best" fit - in this case "6".

score 0 · Accepted Answer · answered Jan 29 '10 at 17:17

Not 100% perfect but the best out of all the things I have tried:

http://www.paperfile.net/ combined with a language pack (free to download instructions in app) copy and paste whole of the text to a google doc, then use the tools > translate in google docs

scan A4 doc > pdf > ocr > translate to english?

3 Answers3

Linked