I want my program to read /, _, and \ from an image but sometimes it reads / as I and /_\ as A. I am using the pytesseract library to do this.
Is there a way to specifically read characters like /_ and \?
Asked
Active
Viewed 424 times
2
-
Please make sure to understand you should always try to produce a minimal reproducible example. More info here: https://stackoverflow.com/help/minimal-reproducible-example – Celius Stingher Sep 05 '19 at 01:42
1 Answers
0
You can use pytesseract.image_to_string to read text from an image. Depending on your image, you may want to perform preprocessing before throwing it into Pytesseract. This can be a combination of thresholding, blurring, or smoothing techniques using morphological operations. Using this example image,
Here's the result printed to the console
We use the --psm 6 config flag since we want to treat the image as a single uniform block of text. Here's some additional configuration flags that could be useful
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image = cv2.imread('1.png',0)
data = pytesseract.image_to_string(image, lang='eng',config='--psm 6')
print('Result:', data)
nathancy
- 42,661
- 14
- 115
- 137

