Convert image to text

Question

I got a scanned image document from bank and I want to convert it to normal text document with images in Ubuntu.

Is there any tool for it ?

score 18 · Accepted Answer · edited Mar 20 '17 at 10:17

There are a number of OCR readers for linux that can convert from image to text. Look at the following options:

All the above, except ocropus, are present in the Ubuntu repository in a package of the same name.

Different readers support different image formats, so you may be limited in your options by the file format your document is in. Alternatively, you can use the convert tool from ImageMagick to change the format if you wish to use a particular OCR reader.

Adapted from my answer here.

score 1 · Answer 2 · edited Sep 29 '22 at 14:44

You need to install "tesseract-ocr" on your Linux machine first.

sudo apt-get install tesseract-ocr

You can do it manually from the command line:

tesseract -l eng input.jpg output

Alternatively, I have written PHP code to do the same, you can use it if you want.

Note: For running this code, the exec command should be enabled in php.ini.

<?php
//IMAGE TO TXT Conversion
    $input_file = $_REQUEST['input_file'];
    $out = explode(".",$input_file);
$output_file = $out[0].&quot;_&quot;.$out[1];
$output_file_name  =    $output_file.&quot;.txt&quot;;

echo &quot;&lt;br /&gt;----IMAGE To TXT conversion Started-----&lt;/br /&gt;&quot;;
echo  exec('tesseract '.$input_file.' '.$output_file);
echo &quot;&lt;br /&gt;----TXT conversion Done-----&lt;/br /&gt;&quot;;

echo &quot;&lt;br /&gt;&lt;b&gt;Please Check-----&gt;&quot;.$output_file.&quot;.txt&lt;/b&gt;&lt;br /&gt;&quot;;
echo &quot;Click &lt;a target='_blank' href='&quot;.$output_file_name.&quot;'&gt;Here &lt;/a&gt;to view it&lt;br /&gt;&quot;; 

?>

Put this code in root folder and access it from browser, e.g:

http://yourserver.com?input_file=1.png

Note: The file 1.png should be present in your current directory.

Convert image to text

2 Answers2

Linked