2

I've tried using a combination of

  • my home scanner to create a '300 dpi', 'document', 'pdf' (options on Canon all-in-one)
  • ZoHoViewer to create either an RTF or TXT file
  • google docs to translate

I'm not sure how good or bad a product ZoHoViewer is, but the following:

Als Arbeitsmarkbehörde haben wir den gesetzlichen Auftrag, die Vermittelbarkeit von

turns into:

AlsArbeitsmarktbeh6rde habenwirdengesetzlichenAuftrag,dieVermittelbarkeit vonSt...

consequently, goog docs makes a pig's breakfast of trying to translate it.

Does anyone have any better suggestions (preferably free online services)

adolf garlic
  • 2,155

3 Answers3

5

There have been several other questions on SuperUser on OCR, which might be worth checking out for possible solutions.

Most notably this answer by Molly looks promising:

I really like TopOCR, certainly a great addition to your scan tools:

  • Incredible OCR accuracy, upto 99.8% with a 3 MP camera
  • No page limits, and no extra downloads or components needed
  • Handles images with mixed text and graphics (Manual or Auto Zoning)
  • Tolerates skew and uneven lighting
  • Multiple text output formats, including searchable PDF and HTML
  • Able to read 11 different languages
  • Powerful, easy to use Image Processing with Image Dewarping
  • Supports Smartphones: See some Smartphone samples
  • Includes built-in, full featured Text and Image WYSIWYG Editors
  • Post-processing spell checker for all 11 languages
  • Built-in Text-To-Speech software. How about OCR to MP3?
  • Includes a built-in multi-lingual text translater
  • Supports a Command Line Interface and a GUI
  • Make a high performance document Search and Indexing system
  • Browser Helper Mode supports creating free audio eBooks
  • With TopOCR's Web Engine it's easy to add new features

alt text

it's very accurate and works excellent with low quality images such as photographs of pages/documents

TopOCR is freeware (can be made portable with Universal Extractor)

Further reading:

Which OCR software has the most options?

Practical OCR solution for converting a large book to a digital format?

How to extract text with OCR from a PDF on Linux?

Ivo Flipse
  • 24,894
4

Given that the OCR has converted:

Als Arbeitsmarkbehörde ...

to:

AlsArbeitsmarktbeh6rde ...

A couple of things spring to mind.

  1. Try scanning at a higher dpi. It looks like it can't recognise the space between the words, a higher dpi might improve that.

  2. Can you set the language of your OCR program? I see that it's converted the "ö" to a "6". While this might be a problem caused by the resolution it might also be that as "ö" isn't an everyday part of English, the program is choosing the "next best" fit - in this case "6".

ChrisF
  • 41,540
0

Not 100% perfect but the best out of all the things I have tried:

http://www.paperfile.net/ combined with a language pack (free to download instructions in app) copy and paste whole of the text to a google doc, then use the tools > translate in google docs

adolf garlic
  • 2,155