I was very much excited to solve the Sudoku puzzles that appeared in daily newspapers during childhood. In fact it’s a lot of fun solving them and even we (my friends) challenge ourselves to solve them first. By then we don’t know programming and other stuff, so we were not aware of finding the solution with ease. But when I was learning Data Structures and Algorithms long back I got encountered with the famous Backtracking algorithm and yes, solving a sudoku puzzle is one among the challenges that can be solved easily using that algorithm.
Though we could solve the sudoku by using the algorithm we had to create a list by looking at the original puzzle and then feed that list to the algorithm to solve the sudoku. That was a boring task since we need to note all the 81 elements in case of
9x9 puzzle. Why can’t we simply scan the puzzle and get the solved one instead? Yes! It would be exciting right? This is what we are going to do in this series of blog posts.
The outcome of the project is exactly as shown in the above image. We capture the sudoku puzzle and it returns the solved puzzle as shown. To carry out this objective we exactly need to do the following.
- Preprocess the image
- Extract/Perspective transform the puzzle region
- Slide through the each cell of the puzzle and recognize the digits
- Solve the puzzle
- Reproject the solved puzzle to the original image
In this post, we complete all the steps listed above from preprocessing to recognizing the digits. So in this post we completely deal with the preprocessing, extracting, perspective transform and recognizing the digits in the cells. For this we build a script that captures the puzzle from the live streaming video and then solves it.
Let’s start off by opening the file named
solve.py and write the code in it.
import cv2 import numpy as np import imutils import argparse from imutils.perspective import four_point_transform,order_points from skimage.filters import threshold_adaptive from skimage.segmentation import clear_border from keras.models import load_model from sudoku import SolveSudoku
We start off by importing necessary packages. Here we imported
keras.models.load_model to load the trained model from the previous post, therefore we don’t discuss the training of our model here in this post. And then imported
sudoku.SolveSudoku which we will see in the next post.
#parse arguments ap = argparse.ArgumentParser() ap.add_argument("-m","--model",required=True,help="Path to trained model...") args = vars(ap.parse_args()) #capture from the webcam cap = cv2.VideoCapture(0) poly = None #to capture the sudoku co-ordinates
We then parse the arguments, which takes a single command line argument for our trained model to recognize the digits in the cells. Then capture from the webcam and create a
None type variable for
poly which holds the captured sudoku co-ordinates from the original image.
Preprocess the image
while True: ret,image = cap.read() #resize image image = imutils.resize(image,width=800) cv2.imshow("Original",image) #to gray gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY) #blur the image blurred = cv2.GaussianBlur(gray,(5,5),0) #apply adaptive thresholding thresh = threshold_adaptive(blurred,block_size=5,offset=1).astype("uint8")*255 cv2.imshow("Thresholded",thresh)
We loop over the captured frames and convert the image to grayscale. Then we apply gaussian blur to the gray scaled image and then we apply adaptive thresholding to the blurred image. Here we used
threshold_adaptive from the
skimage.filters library rather than from
cv2 since it is more pythonic. Also we need to convert the
threshold_adaptive returns the
bool type and then finally multiply with
255. Let’s assume the below shown image as the original image (for example).
Extract/Perspective transform the puzzle region
Now that we are ready to with the most crucial step i.e., extracting the puzzle region from the original image.
key = cv2.waitKey(0) & 0xFF if key == ord("q"): break elif key == ord("c"): #find countours cnts,_ = cv2.findContours(thresh.copy(),cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE) #sort the contours with highest area first cnts = sorted(cnts,key=cv2.contourArea,reverse=True) mask = np.zeros(thresh.shape,dtype="uint8") c = cnts clone = image.copy() #approximate the contours peri = cv2.arcLength(c,closed=True) poly = cv2.approxPolyDP(c,epsilon=0.02*peri,closed=True) #only if a valid region if len(poly) == 4: cv2.drawContours(clone,[poly],-1,(0,0,255),2) #apply perspective transform warped = four_point_transform(image,poly.reshape(-1,2)) cv2.imshow("Contours",clone) cv2.imshow("Warped",warped) break
Before actually extracting the region we create a hook to break from the streaming loop using
cv2.waitKey(0) & 0XFF and quit if the key pressed is
q and continue further if it is
c otherwise can be treated as capture.
We then find the contours for the thresholded image, remember that
cv2.findContours is a destructive method and therefore we use
thresh.copy(). And sort the contours in the decreasing order of their area, then create a mask (empty) same as shape of the
thresh. Then we take the second contour as the puzzle contour since the first one indeed will be the actual image (complete) itself (most of the times), therefore the second contour (large in area) is considered as the region of the puzzle.
To further verify whether the captured contours are the puzzle itself we approximate our contours with 2% of its perimeter using
cv2.approxPolyDP. If the
no.of.points is 4 then it is a valid region. And draw the contours accordingly on the puzzle region.
Now that we have found the actual region of the puzzle. Then what we need to do is to apply perspective transform on the extracted region. If you are not familiar with what exactly a perspective transform is you better go to this link. Here we use
imutils.perspective library which can be installed through
pip install imutils. This is just a wrapper function that makes the 4 point transform in an easy way. Then we warp our puzzle region and the warped puzzle can be seen below.
Slide through the each cell of the puzzle and recognize the digits
key = cv2.waitKey(0) #just in case if the captured one is not the actual puzzle if key&0XFF == ord("q"): exit() #convert to gray and find the sliding window width & height warped = cv2.cvtColor(warped,cv2.COLOR_BGR2GRAY) winX = int(warped.shape/9.0) winY = int(warped.shape/9.0) #load the trained model model = load_model(args["model"]) #empty lists to hold recognized digits and center co-ordinates of the cells labels =  centers = 
Now that we have the warped puzzle. Next we need to slide through each cell of the puzzle and recognize the digits. Before going through that, we create a hook to quit in case the captured region is not a puzzle. And then convert the warped to grayscaled image.
Then find the approximate
height of each cell in the puzzle so that we can slide our window through each of the cells. The pre-trained model is loaded from disk which helps us recognizing the digits. And finally initialize two lists namely
labels for holding the recognized digits and
centers for holding the approximate center co-ordinates of each cell.
#slide the window through the puzzle for y in xrange(0,warped.shape,winY): for x in xrange(0,warped.shape,winX): #slice the cell window = warped[y:y+winY,x:x+winX] #sanity check if window.shape != winY or window.shape != winX: continue clone = warped.copy() digit = cv2.resize(window,(28,28)) _,digit = cv2.threshold(digit,0,255,cv2.THRESH_BINARY_INV|cv2.THRESH_OTSU) #clear borders digit = clear_border(digit) cv2.imshow("Digit",digit) #whether an empty cell or not numPixels = cv2.countNonZero(digit) if numPixels<5: label = 0 else: label = model.predict_classes([digit.reshape(1,28,28,1)]) labels.append(label) centers.append(((x+x+winX)//2,(y+y+winY+6)//2)) #draw rectangle for each cell cv2.rectangle(clone,(x,y),(x+winX,y+winY),(0,0,255),2) cv2.imshow("Window",clone) cv2.waitKey(0)
We slide through each of the cell in the puzzle and slice the each of the cell region and then we resize it to
28x28 since our MNIST training data contains each sample of
28x28 shape. As always we then threshold the digit and clear the borders which makes it more resistant to outliers as the sliding window may have border lines since it may not be accurate. We decide a region as an empty cell if the
no.of.non zero pixels in that region are less than a certain threshold. Then the extracted digit is reshaped to
(1,28,28,1) since the model we would be using the Convolution Neural Network method which we had trained in the previous post. The predicted digit and the center co-ordinate of the cell are appended to their respective lists. The sliding window can be visualized as below.
Therefore, we now have our recognized digits in the list
labels and we are good to go for solving the puzzle. But wait, not so fast. We will discuss how to solve the sudoku puzzle using Backtracking algorithm and then reproject the solved puzzle on to the original image in the next post.
Thank you, Have a nice day…