Cleaning image to help tesseract on Android

Question

I'm trying to extract digits from a sudoku board. after detecting the board, its corners and transforming, I'm left with a pretty lined up image of only the board. Now I'm trying to recognize the digits using Tesseract android implementation, Tess-Two. I split the image to 9 parts by

currentCell = undistortedThreshed.submat(rect);

where rect is the rectangle that surrounds the image.

Now to the digits recognition.

Some digits, like 4, it recognize perfectly. Some, mostly 6,7,8 are recognized as 0s or nothing.

I want to help tesseract as much as I can by cleaning the currentCell image. at the moment it looks like this Inverted 6 . (also tried without the Inverted thresholding). I want to get rid of the white lines (the sudoku lines). I've tried something like this (taken from here)

Imgproc.Canny(currentCell, currentCell, 80, 90);
Mat lines = new Mat();
int threshold = 50;
int minLineSize = 5;
int lineGap = 20;

Imgproc.HoughLinesP(currentCell, lines, 1, Math.PI / 180,
        threshold, minLineSize, lineGap);
for (int x = 0; x < lines.cols() && x < 1; x++) {
    double[] vec = lines.get(0, x);
    double x1 = vec[0], y1 = vec[1], x2 = vec[2], y2 = vec[3];
    Point start = new Point(x1, y1);
    Point end = new Point(x2, y2);

    Core.line(currentCell, start, end, new Scalar(255), 10);

}

but it doesn't draw anything, I tried messing with the line's width and color, but still nothing. Tried drawing the line on the large image, on the unthreshed image, nothing works..

Any suggestions?

EDIT

For some reason, it can't seems to find any lines. This is what that image looks after applying canny to it 6 after canny but the HoughLines doesn't detect any lines. Tried both HoughLines and HoughLinesP with different values, as shown in the OpenCV documentation, but nothing works... Those are pretty obvious lines.. what am I doing wrong? Thanks!

make new Scalar(0) instead of 255. – AruniRC Nov 25 '12 at 02:23 — AruniRC, Nov 25 '12 at 02:23

score 3 · Answer 1 · answered Nov 26 '12 at 05:18

I ended up doing something different.

I used findContours to get the biggest contour, which is the digit.

Got its bounding box by using boundingRect.

Extracted this using submat and voilla. I got only the digit.

Unfortunately, it seems to make no difference at all. Tesseract still can't recognize the digits correctly. Sometimes it gives no result, sometimes, after dilating the digits it recognizes the 6 as 0. But that's an issue for another question.

score 1 · Answer 2 · answered Nov 25 '12 at 02:20

1

This is an idea right off the top of my head:

Keep the code that computes the Hough Lines in the image. Which means you can get the lines corresponding to the grid.

Now, simply draw those lines on the original image, but set the color to BLACK.

Most of the white lines would now be covered with the newly-drawn black lines. As Hough line positions are not exactly matching the actual lines, a few small dots of white might remain. Eliminating them via connected-components (and discarding the components that are too tiny) or even some morphological operations - taking care that the actual digit remains unaltered - could handle these imperfections.

Do try it out and let me know. Hope this helps you.

answered Nov 25 '12 at 02:20

AruniRC

5,070
7
43
73

Thanks. That's exactly what i'm trying to do, however for some reason, the Hough Lines doesn't detect any lines... – La bla bla Nov 25 '12 at 23:36
then decrease the `threshold`, `minLineSize`. As in decrease threshold to 2,3 or something really low. See what sort of lines come up. then increase til a tipping point. – AruniRC Nov 26 '12 at 02:20
also you are drawing the lines on the `currentCell` image, which already has things drawn. have you tried drawing the lines on a blank (black) image first. Maybe the lines are being drawn, but you can't see them due to already present objects in the image. – AruniRC Nov 26 '12 at 02:22
I'm printing `lines.cols()` to check how many lines were detected. And I also tried to draw it onto another mat. I will try with a very low threshold. Thanks for your comment – La bla bla Nov 26 '12 at 02:30
Well, using threshold of 2 does yields some lines, but at random places (for example, inside the upper arc of the 6. Those 2 large lines are left undetected. tried inverting the image before, same result.. any more ideas? – La bla bla Nov 26 '12 at 02:41

Cleaning image to help tesseract on Android

2 Answers2