I got a scanned image document from bank and I want to convert it to normal text document with images in Ubuntu.
Is there any tool for it ?
I got a scanned image document from bank and I want to convert it to normal text document with images in Ubuntu.
Is there any tool for it ?
There are a number of OCR readers for linux that can convert from image to text. Look at the following options:
All the above, except ocropus, are present in the Ubuntu repository in a package of the same name.
Different readers support different image formats, so you may be limited in your options by the file format your document is in. Alternatively, you can use the convert tool from ImageMagick to change the format if you wish to use a particular OCR reader.
Adapted from my answer here.
You need to install "tesseract-ocr" on your Linux machine first.
sudo apt-get install tesseract-ocr
You can do it manually from the command line:
tesseract -l eng input.jpg output
Alternatively, I have written PHP code to do the same, you can use it if you want.
Note: For running this code, the exec command should be enabled in php.ini.
<?php
//IMAGE TO TXT Conversion
$input_file = $_REQUEST['input_file'];
$out = explode(".",$input_file);
$output_file = $out[0]."_".$out[1];
$output_file_name = $output_file.".txt";
echo "<br />----IMAGE To TXT conversion Started-----</br />";
echo exec('tesseract '.$input_file.' '.$output_file);
echo "<br />----TXT conversion Done-----</br />";
echo "<br /><b>Please Check----->".$output_file.".txt</b><br />";
echo "Click <a target='_blank' href='".$output_file_name."'>Here </a>to view it<br />";
?>
Put this code in root folder and access it from browser, e.g:
http://yourserver.com?input_file=1.png
Note: The file 1.png should be present in your current directory.