I was very much excited to solve the * Sudoku puzzles* that appeared in daily newspapers during childhood. In fact it’s a lot of fun solving them and even we (my friends) challenge ourselves to solve them first. By then we don’t know programming and other stuff, so we were not aware of finding the solution with ease. But when I was learning Data Structures and Algorithms long back I got encountered with the famous

*algorithm and yes, solving a*

**Backtracking***sudoku*

*puzzle*is one among the challenges that can be solved easily using that algorithm.

Though we could solve the *sudoku* by using the algorithm we had to create a list by looking at the original puzzle and then *feed* that list to the algorithm to solve the *sudoku*. That was a boring task since we need to note all the * 81* elements in case of

`9x9`

puzzle. *Why can’t we simply scan the puzzle and get the solved one instead?*Yes! It would be exciting right? This is what we are going to do in this series of blog posts.

The *outcome* of the project is exactly as shown in the above image. We capture the *sudoku* puzzle and it returns the *solved* puzzle as shown. To carry out this objective we exactly need to do the following.

**Preprocess the image****Extract/Perspective transform the puzzle region****Slide through the each cell of the puzzle and recognize the digits****Solve the puzzle****Reproject the solved puzzle to the original image**

### Task

In this post, we complete all the steps listed above from *preprocessing* to *recognizing the digits*. So in this post we completely deal with the *preprocessing*, *extracting*, *perspective* *transform* and *recognizing* the digits in the cells. For this we build a script that captures the puzzle from the live streaming video and then solves it.

### Let’s go

Let’s start off by opening the file named `solve.py`

and write the code in it.

import cv2 import numpy as np import imutils import argparse from imutils.perspective import four_point_transform,order_points from skimage.filters import threshold_adaptive from skimage.segmentation import clear_border from keras.models import load_model from sudoku import SolveSudoku

We start off by importing necessary packages. Here we imported `keras.models.load_model`

to load the trained model from the previous post, therefore **we*** don’t discuss *the training of our model here in this post. And then imported

`sudoku.SolveSudoku`

which we will see in the next post.#parse arguments ap = argparse.ArgumentParser() ap.add_argument("-m","--model",required=True,help="Path to trained model...") args = vars(ap.parse_args()) #capture from the webcam cap = cv2.VideoCapture(0) poly = None #to capture the sudoku co-ordinates

We then parse the arguments, which takes a single command line argument for our trained model to recognize the digits in the cells. Then capture from the *webcam* and create a `None`

type variable for `poly`

which holds the captured *sudoku* co-ordinates from the original image.

### Preprocess the image

while True: ret,image = cap.read() #resize image image = imutils.resize(image,width=800) cv2.imshow("Original",image) #to gray gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY) #blur the image blurred = cv2.GaussianBlur(gray,(5,5),0) #apply adaptive thresholding thresh = threshold_adaptive(blurred,block_size=5,offset=1).astype("uint8")*255 cv2.imshow("Thresholded",thresh)

We loop over the captured frames and convert the image to *grayscale*. Then we apply gaussian blur to the gray scaled image and then we apply adaptive thresholding to the blurred image. Here we used `threshold_adaptive`

from the `skimage.filters`

library rather than from `cv2`

since it is more *pythonic*. Also we need to convert the `thresh`

to `uint8`

as `threshold_adaptive`

returns the `bool`

type and then finally multiply with `255`

. Let’s assume the below shown image as the original image (for example).

### Extract/Perspective transform the puzzle region

Now that we are ready to with the most * crucial* step i.e.,

*extracting the puzzle region*from the original image.

key = cv2.waitKey(0) & 0xFF if key == ord("q"): break elif key == ord("c"): #find countours cnts,_ = cv2.findContours(thresh.copy(),cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE) #sort the contours with highest area first cnts = sorted(cnts,key=cv2.contourArea,reverse=True) mask = np.zeros(thresh.shape,dtype="uint8") c = cnts[1] clone = image.copy() #approximate the contours peri = cv2.arcLength(c,closed=True) poly = cv2.approxPolyDP(c,epsilon=0.02*peri,closed=True) #only if a valid region if len(poly) == 4: cv2.drawContours(clone,[poly],-1,(0,0,255),2) #apply perspective transform warped = four_point_transform(image,poly.reshape(-1,2)) cv2.imshow("Contours",clone) cv2.imshow("Warped",warped) break

Before actually extracting the region we create a hook to *break* from the streaming loop using `cv2.waitKey(0) & 0XFF`

and quit if the key pressed is `q`

and continue further if it is `c`

otherwise can be treated as ** capture**.

We then find the contours for the thresholded image, remember that `cv2.findContours`

is a *destructive* method and therefore we use `thresh.copy()`

. And ** sort** the contours in the

**order of their area, then create a mask (empty) same as shape of the**

*decreasing*`thresh`

. Then we take the *contour as the puzzle contour since the*

**second***one indeed will be the actual image (complete) itself (most of the times), therefore the second contour (large in area) is considered as the region of the puzzle.*

**first**To further verify whether the captured contours are the puzzle itself we approximate our contours with ** 2%** of its perimeter using

`cv2.approxPolyDP`

. If the `no.of.points is 4`

then it is a valid region. And draw the contours accordingly on the *puzzle*region.

Now that we have found the actual region of the *puzzle*. Then what we need to do is to apply * perspective transform* on the extracted region. If you are not familiar with what exactly a perspective transform is you better go to this link. Here we use

`four_point_transform`

from `imutils.perspective`

library which can be installed through `pip install imutils`

. This is just a wrapper function that makes the *4 point transform*in an easy way. Then we

*our puzzle region and the warped puzzle can be seen below.*

**warp**

**Slide through the each cell of the puzzle and recognize the digits**

key = cv2.waitKey(0) #just in case if the captured one is not the actual puzzle if key&0XFF == ord("q"): exit() #convert to gray and find the sliding window width & height warped = cv2.cvtColor(warped,cv2.COLOR_BGR2GRAY) winX = int(warped.shape[1]/9.0) winY = int(warped.shape[0]/9.0) #load the trained model model = load_model(args["model"]) #empty lists to hold recognized digits and center co-ordinates of the cells labels = [] centers = []

Now that we have the * warped* puzzle. Next we need to

*through each cell of the puzzle and*

**slide***the digits. Before going through that, we create a hook to quit in case the captured region is not a puzzle. And then convert the warped to*

**recognize***grayscaled*image.

Then find the approximate `width`

and `height`

of each cell in the puzzle so that we can slide our window through each of the cells. The *pre-trained* model is loaded from disk which helps us recognizing the digits. And finally initialize two lists namely `labels`

for holding the recognized digits and `centers`

for holding the approximate center co-ordinates of each cell.

#slide the window through the puzzle for y in xrange(0,warped.shape[0],winY): for x in xrange(0,warped.shape[1],winX): #slice the cell window = warped[y:y+winY,x:x+winX] #sanity check if window.shape[0] != winY or window.shape[1] != winX: continue clone = warped.copy() digit = cv2.resize(window,(28,28)) _,digit = cv2.threshold(digit,0,255,cv2.THRESH_BINARY_INV|cv2.THRESH_OTSU) #clear borders digit = clear_border(digit) cv2.imshow("Digit",digit) #whether an empty cell or not numPixels = cv2.countNonZero(digit) if numPixels<5: label = 0 else: label = model.predict_classes([digit.reshape(1,28,28,1)])[0] labels.append(label) centers.append(((x+x+winX)//2,(y+y+winY+6)//2)) #draw rectangle for each cell cv2.rectangle(clone,(x,y),(x+winX,y+winY),(0,0,255),2) cv2.imshow("Window",clone) cv2.waitKey(0)

We slide through each of the cell in the *puzzle* and * slice* the each of the cell region and then we resize it to

`28x28`

since our **MNIST**training data contains each sample of

`28x28`

shape. As always we then threshold the digit and *which makes it more resistant to outliers as the sliding window may have border lines since it may not be accurate. We decide a region as an*

**clear the borders***cell if the*

**empty**`no.of.non zero`

pixels in that region are less than a certain threshold. Then the extracted digit is reshaped to `(1,28,28,1)`

since the model we would be using the Convolution Neural Network method which we had trained in the previous post. The predicted digit and the center co-ordinate of the cell are appended to their respective lists. The sliding window can be visualized as below.Therefore, we now have our *recognized* digits in the list `labels`

and we are good to go for * solving the puzzle*. But wait, not so fast. We will discuss how to solve the

*sudoku*puzzle using

**Backtracking**algorithm and then

**reproject**the solved puzzle on to the original image in the next post.

Thank you, Have a nice day…