Sudoku Solver – 1

I was very much excited to solve the Sudoku puzzles that appeared in daily newspapers during childhood. In fact it’s a lot of fun solving them and even we (my friends) challenge ourselves to solve them first. By then we don’t know programming and other stuff, so we were not aware of finding the solution with ease. But when I was learning Data Structures and Algorithms long back I got encountered with the famous Backtracking algorithm and yes, solving a sudoku puzzle is one among the challenges that can be solved easily using that algorithm.

Though we could solve the sudoku by using the algorithm we had to create a list by looking at the original puzzle and then feed that list to the algorithm to solve the sudoku. That was a boring task since we need to note all the 81 elements in case of 9x9 puzzle. Why can’t we simply scan the puzzle and get the solved one instead? Yes! It would be exciting right? This is what we are going to do in this series of blog posts.


Captured puzzle (left) and Solved puzzle (right)

The outcome of the project is exactly as shown in the above image. We capture the sudoku puzzle and it returns the solved puzzle as shown. To carry out this objective we exactly need to do the following.


In this post, we complete all the steps listed above from preprocessing to recognizing the digits. So in this post we completely deal with the preprocessing, extracting, perspective transform and recognizing the digits in the cells. For this we build a script that captures the puzzle from the live streaming video and then solves it.

Let’s go

Let’s start off by opening the file named and write the code in it.

import cv2
import numpy as np
import imutils
import argparse
from imutils.perspective import four_point_transform,order_points
from skimage.filters import threshold_adaptive
from skimage.segmentation import clear_border
from keras.models import load_model
from sudoku import SolveSudoku

We start off by importing necessary packages. Here we imported keras.models.load_model to load the trained model from the previous post, therefore we don’t discuss the training of our model here in this post. And then imported sudoku.SolveSudoku which we will see in the next post.

#parse arguments
ap = argparse.ArgumentParser()
ap.add_argument("-m","--model",required=True,help="Path to trained model...")
args = vars(ap.parse_args())

#capture from the webcam
cap = cv2.VideoCapture(0)
poly = None #to capture the sudoku co-ordinates

We then parse the arguments, which takes a single command line argument for our trained model to recognize the digits in the cells. Then capture from the webcam and create a None type variable for poly which holds the captured sudoku co-ordinates from the original image.

Preprocess the image

while True:
    ret,image =
    #resize image
    image = imutils.resize(image,width=800)
    #to gray
    gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
    #blur the image
    blurred = cv2.GaussianBlur(gray,(5,5),0)
    #apply adaptive thresholding
    thresh = threshold_adaptive(blurred,block_size=5,offset=1).astype("uint8")*255

We loop over the captured frames and convert the image to grayscale. Then we apply gaussian blur to the gray scaled image and then we apply adaptive thresholding to the blurred image. Here we used threshold_adaptive from the skimage.filters library rather than from cv2 since it is more pythonic. Also we need to convert the thresh to uint8 as threshold_adaptive returns the bool type and then finally multiply with 255. Let’s assume the below shown image as the original image (for example).


Extract/Perspective transform the puzzle region

Now that we are ready to with the most crucial step i.e., extracting the puzzle region from the original image.

    key = cv2.waitKey(0) & 0xFF
    if key == ord("q"):
    elif key == ord("c"):
        #find countours
        cnts,_ = cv2.findContours(thresh.copy(),cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

        #sort the contours with highest area first
        cnts = sorted(cnts,key=cv2.contourArea,reverse=True)
        mask = np.zeros(thresh.shape,dtype="uint8")
        c = cnts[1]
        clone = image.copy()

        #approximate the contours
        peri = cv2.arcLength(c,closed=True)
        poly = cv2.approxPolyDP(c,epsilon=0.02*peri,closed=True)

        #only if a valid region
        if len(poly) == 4:
            #apply perspective transform
            warped = four_point_transform(image,poly.reshape(-1,2))

Before actually extracting the region we create a hook to break from the streaming loop using cv2.waitKey(0) & 0XFF and quit if the key pressed is q and continue further if it is c otherwise can be treated as capture.

We then find the contours for the thresholded image, remember that cv2.findContours is a destructive method and therefore we use thresh.copy(). And sort the contours in the decreasing order of their area, then create a mask (empty) same as shape of the thresh. Then we take the second contour as the puzzle contour since the first one indeed will be the actual image (complete) itself (most of the times), therefore the second contour (large in area) is considered as the region of the puzzle.

To further verify whether the captured contours are the puzzle itself we approximate our contours with 2% of its perimeter using cv2.approxPolyDP. If the no.of.points is 4 then it is a valid region. And draw the contours accordingly on the puzzle region.



Now that we have found the actual region of the puzzle. Then what we need to do is to apply perspective transform on the extracted region. If you are not familiar with what exactly a perspective transform is you better go to this link. Here we use four_point_transform from imutils.perspective library which can be installed through pip install imutils. This is just a wrapper function that makes the 4 point transform in an easy way. Then we warp our puzzle region and the warped puzzle can be seen below.




Slide through the each cell of the puzzle and recognize the digits

key = cv2.waitKey(0)

#just in case if the captured one is not the actual puzzle
if key&0XFF == ord("q"):

#convert to gray and find the sliding window width & height
warped = cv2.cvtColor(warped,cv2.COLOR_BGR2GRAY)
winX = int(warped.shape[1]/9.0)
winY = int(warped.shape[0]/9.0)

#load the trained model
model = load_model(args["model"])

#empty lists to hold recognized digits and center co-ordinates of the cells
labels = []
centers = []

Now that we have the warped puzzle. Next we need to slide through each cell of the puzzle and recognize the digits. Before going through that, we create a hook to quit in case the captured region is not a puzzle. And then convert the warped to grayscaled image.

Then find the approximate width and height of each cell in the puzzle so that we can slide our window through each of the cells. The pre-trained model is loaded from disk which helps us recognizing the digits. And finally initialize two lists namely labels for holding the recognized digits and centers for holding the approximate center co-ordinates of each cell.

#slide the window through the puzzle
for y in xrange(0,warped.shape[0],winY):
    for x in xrange(0,warped.shape[1],winX):
        #slice the cell
        window = warped[y:y+winY,x:x+winX]
        #sanity check
        if window.shape[0] != winY or window.shape[1] != winX:

        clone = warped.copy()
        digit = cv2.resize(window,(28,28))
        _,digit = cv2.threshold(digit,0,255,cv2.THRESH_BINARY_INV|cv2.THRESH_OTSU)
        #clear borders
        digit = clear_border(digit)

        #whether an empty cell or not
        numPixels = cv2.countNonZero(digit)
        if numPixels<5:
            label = 0
            label = model.predict_classes([digit.reshape(1,28,28,1)])[0]
        #draw rectangle for each cell

We slide through each of the cell in the puzzle and slice the each of the cell region and then we resize it to 28x28 since our MNIST training data contains each sample of 28x28 shape. As always we then threshold the digit and clear the borders which makes it more resistant to outliers as the sliding window may have border lines since it may not be accurate. We decide a region as an empty cell if the no.of.non zero pixels in that region are less than a certain threshold. Then the extracted digit is reshaped to (1,28,28,1) since the model we would be using the Convolution Neural Network method which we had trained in the previous post. The predicted digit and the center co-ordinate of the cell are appended to their respective lists. The sliding window can be visualized as below.



Therefore, we now have our recognized digits in the list labels and we are good to go for solving the puzzle. But wait, not so fast. We will discuss how to solve the sudoku puzzle using Backtracking algorithm and then reproject the solved puzzle on to the original image in the next post.

Thank you, Have a nice day…