Recognize Handwritten digits – 1

Recognizing handwritten digits is an easy task and I feel this would be a good one to start with. So coming straight to the point, this post is broken down into 2 parts to make it easy for understanding. The general handwritten digit pipeline can be explained as

  • Preprocessing
  • Segmentation
  • Recognition

In this part we will discuss about the first two, i.e., preprocessing and segmentation. To just give you an idea what is going to be the outcome of this complete post is shown below.


Hey yeah!! That’s my mobile number…


The task of this blog post is to take an input image as shown in above image (left) and do some preprocessing on it and then finally segment each of the detected digits. The complete details in this post uses image processing techniques which serves as great tool for feature extraction for recognizing the digits later in the next post.

Let’s go

Let’s start off by opening an empty file named

import numpy as np
import cv2
import imutils
import argparse

We start by importing necessary packages, numpy for matrix operations, cv2(OpenCV) for image processing and imutils provides simple utilities which are very useful for image processing authored by Adrian Roosebrock, this can be installed using pip install imutils, and finally argparse for command line argument parsing.

#parse arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i","--image",required=True,help="Path to image to recognize")
args = vars(ap.parse_args())

Our script takes a single input image that needs to be processed further.

image = cv2.imread(args["image"])
image = imutils.resize(image,width=320)
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)

We read our input image through argument parser and resize it to have width of 320 as it is alway better to resize before proceeding. Note that I’ve used imutils for resizing and not cv2 as we need to maintain the aspect ratio of our image and this can be easily done with imutils. And convert the input image to grayscale image as grayscale images are always best suited for image processing tasks. The gray scaled image is shown below.



kernel = cv2.getStructuringElement(cv2.MORPH_RECT,(5,5))
blackhat = cv2.morphologyEx(gray,cv2.MORPH_BLACKHAT,kernel)

We then create a rectangular kernel with size (5x5) and apply blackhat morphological operation on the grayscale image. Blackhat is an morphological operation which reveals dark regions on a light background. Since our task is to detect handwritten digits written on a paper, we need to reveal the details written by pen (dark regions) ignoring the paper (light background) we use blackhat operation here. And the resulting image can be seen below.



Though the hand written digits get revealed we can see some noise in the above image which might have occur due to folding, lighting, etc., . These can be reduced by thresholding the above image.

_,thresh = cv2.threshold(blackhat,0,255,cv2.THRESH_BINARY|cv2.THRESH_OTSU)
thresh = cv2.dilate(thresh,None)

We apply OTSU thresholding that attempt to be more dynamic and automatically compute the optimal threshold value based on the input image in controlled lighting conditions.



From the above image it is clear that only the desired shape is retained and the noise is eliminated completely. Though it is okay to proceed to the segmentation step it is always better to dilate the image which grows the foreground pixels that makes the segmentation step easier.



(cnts,_) = cv2.findContours(thresh.copy(),cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
avgCntArea = np.mean([cv2.contourArea(k) for k in cnts])

digits = []
boxes = []

Now we continue to the segmentation step. We first find contours in the above image (always remember to use a copy of the image as cv2.findContours is a destructive method) and compute the average contour area, and initialize digits and boxes list which holds the segmented digits part and surrounding box coordinates for each digit.

for (i,c) in enumerate(cnts):
    if cv2.contourArea(c)<avgCntArea/10:
    mask = np.zeros(gray.shape,dtype="uint8")

    (x,y,w,h) = cv2.boundingRect(c)
    hull = cv2.convexHull(c)
    mask = cv2.bitwise_and(thresh,thresh,mask=mask)

    digit = mask[y-8:y+h+8,x-8:x+w+8]
    digit = cv2.resize(digit,(28,28))

We then loop over all the contours and check whether it is a valid digit using the average contour area, since there might be some unwanted details still exist we check for validation, then find the boundingRect coordinates for the contour and find the convex hull for each of the contour and drawing the hull on the mask which is a empty image, applying cv2.bitwise_and between thresh and mask we are able to segment the digit region irrespective of how close the digits are placed and then we resize the digit to (28x28) and finally append the digit and bounding box coordinates to digits and boxes list respectively.

The mask generated for each contour and the bitwise_and between thresh and mask can be visualized as below.



Now that we have each of the digits extracted/segmented and we are good to go for recognition step. Well wait, not now but in the next post. The process we have implemented not only applies to the image that shown here but to any image which is similar to this, I mean under controlled lighting conditions.

In the next post, we will cover how to recognize the segmented digits using deep learning techniques, we first implement vanilla Multi Layer Perceptron and then Convolution Neural Networks for training our model in recognition step. So that would be a long post and that is the reason why splitting this post into 2 parts.

Thank you, Have a nice day…