Recognize Handwritten digits – 2

This post is going to be a long one and you may ask why don’t I split this up? And yes there is a reason for that, it would be better to complete all the details in this post at once rather than taking a break as you may get lost overtime. So getting to the point, in the last post we were able to segment the individual digits from the image. In this post we will develop a model to recognize those individual digits.

Before actually building our model we need data. I used the most popular MNIST dataset which includes 60,000 training images and 10,000 testing images of digits from 0-9. You can download the dataset from this link. These files are in .csv format and we need to make them available for our training and testing operations. Before continuing any further let’s review our project structure below.

DigitRecognizer
├── dataset/
│   ├── mnist_test.csv
│   ├── mnist_train.csv
├── output/
├── recognize.py
├── train_keras_cnn.py
├── train_keras.py
└── utils/
    ├── dataset.py
    ├── __init__.py
  • dataset/ – It contains of two files namely mnist_train.csv and mnist_test.csv which are downloaded from the link above mentioned.
  • output/ – This directory is used to save the trained models. It’s always better to have all the trained models at one place!!
  • recognize.py – The driver script which takes an input and recognize the digits.
  • train_keras.py – Trains a simple Multi-Layer-Perceptron using keras
  • train_keras_cnn.py – Trains a Convolutional Neural Network model which we will discuss in the next post using keras.
  • utils/dataset.py – It consists of all the utilities that are necessary to load the dataset and processing.

Now that we have an idea of what we will be doing right? Okay, let’s first open up utils package and code in dataset.py file.

import numpy as np
import matplotlib.pyplot as plt

def load_dataset(path,delimiter=","):
    arr = np.loadtxt(path,delimiter=delimiter,dtype="uint8")
    labels = arr[:,0]
    data = arr[:,1:]
    return data,labels

We start off by creating a function called load_dataset which is used to load our dataset which is in .csv from disk and returns them as numpy array. np.loadtxt is used to actually load the dataset from disk. We sliced the first column and named it as labels and the remaining from second column as data, since our dataset consists of data in a format that the first element is the label and the remaining 784 elements followed are the pixel values for each digit. We can actually resize the 784 pixel values to 28x28 image which represents an image which we will see in a while. And finally returns the data and labels.

def plot_dataset(data,labels,predictions=None,annotate=True):
    assert len(data)==len(labels) == 16

    for i in xrange(len(labels)):
        plt.subplot(4,4,i+1)
        plt.axis("off")
        plt.imshow(data[i],cmap="gray")
        title = "True: {}".format(labels[i])
        if predictions is not None:
            title += " Predicted: {}".format(predictions[i])
        if annotate:
            plt.title(title)
    plt.show()

plot_dataset is used to plot the digits with their true labels and predicted labels (optional). It plots a 4x4 grid and therefore it needs 16 images. The code itself is self explainatory.

def encode(y):
    Y = np.zeros((y.shape[0],len(np.unique(y))))
    for i in xrange(y.shape[0]):
        Y[i,y[i]] = 1
    return Y

encode encodes the numerical input to categorical_form which is well suited for neural networks. It means our dataset consists of only true class label but we need to make it one-hot encoding. For example, 3 needs to be encoded as 0001000000 and 7 needs to be encoded as 0000000100 (remember we start from 0).

Now that we have required utilities to handle the dataset, let’s visualize our dataset and see how the training and testing images look like…

screenshot-from-2017-01-02-08-07-16

Visualizing training data

Multi Layer Perceptron

Now let’s get our hands dirty with building a simple Multi-Layer-Perceptron in keras and train it against the mnist dataset. Open up train_keras.py and start coding.

from __future__ import print_function
from keras.models import Sequential
from keras.layers import Dense
from utils.dataset import load_dataset,encode

Let’s start off by importing necessary packages. We are using keras to build this simple network, I’ve also included the tensorflow version of the same MLP in the link that is available below pointing to my github repository. For making things simpler let’s go with keras.

trainData,trainLabels = load_dataset("dataset/mnist_train.csv")
trainLabels = encode(trainLabels)

testData,testLabels = load_dataset("dataset/mnist_test.csv")
testLabels = encode(testLabels)

We then load our training and testing datasets using utils.dataset.load_dataset utility function and then encode the corresponding labels such that it is suitable for neural network propagation.

trainData = trainData.astype("float32")
testData = testData.astype("float32")

trainData /= 255
testData /= 255

It is always better to convert our dataset to float32, since our dataset is in uint8 format that ranges from (0,255), and then normalize the dataset by dividing it with 255 to make it range in between (0,1). Always normalize the data before training a model.

model = Sequential()
model.add(Dense(input_dim=784,output_dim=256,activation='relu',init="normal"))
model.add(Dense(output_dim=256,activation='relu',init="normal"))
model.add(Dense(output_dim=10,activation="softmax"))

Let’s start building our model by making it sequential model. Our model consists of 4 layers, i.e., Input Layer has 784 nodes and the two hidden layers has 256 nodes each and finally the output layer with 10 nodes since we need to classify between 0 to 9. I’ve used relu as activation function and softmax at the output layer as it has more than 2 classes it is better to use softmax for the final layer.

model.compile("adam","categorical_crossentropy",metrics=["accuracy"])
model.fit(trainData,trainLabels,batch_size=100,nb_epoch=25,verbose=2,validation_data=(testData,testLabels))

Now that we have our model ready and we need to compile it. I’ve used Adam Optimizer, and the cross entropy as the cost function. Then we fit our model with trainData and trainLabels with batchsize of 100 and no.of.iteration is 25. This step may take a long time as it iterates through 25 times through each batch of size 100 i.e., 600*25 generic iterations as our training set consists of 60,000 images.

print(model.summary())
score = model.evaluate(testData,testLabels)
print('Test cost:', score[0])
print('Test accuracy:', score[1])

model.save("output/mnist.model")

We then print our model summary to the console. Then find the cost and accuracy of the model and finally save the trained model to disk. The training output in the console may look like this.

screenshot-from-2017-01-01-12-00-39

We can achieve the accuracy upto 96% which seems to be good.

So we have our model ready. Let’s plug in this model into the driver script and recognize the digits. For that we need to tweak our recognize.py file a little to actually make it work.

import numpy as np
import cv2
import imutils
import argparse
from keras.models import load_model

#parse arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i","--image",required=True,help="Path to image to recognize")
ap.add_argument("-m","--model",required=True,help="Path to saved classifier")
args = vars(ap.parse_args())

We just added the two lines which are highlighted in the recognize.py. We will use keras.model.load_model to load our trained classifier from the disk and added argument parser to take in path to the saved model.

Let’s add these lines to the recognize.py at the bottom

digits = np.array(digits)
model = load_model(args["model"])
labels = model.predict_classes(digits.reshape(-1,784))

We need to convert the digits list to numpy array to make the computation on all the digits at once rather than looping over each of them. Then we load our saved model using load_model method and finally we attempt to recognize the digits using model.predict_classes method. Remember we always need to reshape our dataset to match the training data shape.

cv2.imshow("Original",image)
cv2.imshow("Thresh",thresh)

for (x,y,w,h),label in sorted(zip(boxes,labels)):
    cv2.rectangle(image,(x,y),(x+w,y+h),(0,0,255),1)
    cv2.putText(image,str(label),(x+2,y-5),cv2.FONT_HERSHEY_SIMPLEX,1.0,(0,255,0),2)
    cv2.imshow("Recognized",image)
    cv2.waitKey(0)

cv2.destroyAllWindows()

Finally we are good to go and loop over the sorted bounding boxes (sorted) and label (corresponding) so that to help us draw the bounding box on the image with labels on them from left to right.

recognized

As you can see that the 4th digit was misclassified as 9, and yes that would be possible since we just got 96%. We can actually increase the accuracy by using Convolutional neural networks which we will discuss later in this post. The output may look like something like this for other image where the digits are written much more closer.

screenshot-from-2017-01-02-08-54-38

And again the 6th digit was misclassified. We can actually make this recognize correctly by using Convolutional Neural Networks which increases our model accuracy upto 99%.

Convolutional Neural Network

If you have no idea of what exactly a Convolutional Neural Network (CNN) do, please refer this link and come back.

So let’s start building our CNN model using keras. Open up the file named train_keras_cnn.py and start coding.

from __future__ import print_function
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras import backend as K
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from utils.dataset import load_dataset,encode

Let’s start off by importing all the necessary packages required to build our model. We use Sequential model for building our model.

batch_size = 128
nb_classes = 10
nb_epoch = 12

# input image dimensions
img_rows, img_cols = 28, 28
# number of convolutional filters to use
nb_filters = 32
# size of pooling area for max pooling
pool_size = (2, 2)
# convolution kernel size
kernel_size = (3, 3)

We define some parameters like no.of.iterations, batch_size, no.of.classes and then no.of.rows, no.of.cols of image array and no.of.filters for convolutional neural network with kernel_size (3×3). And we then define the maxpooling size to be exactly half of the image, therefore it is (2×2).

trainData,trainLabels = load_dataset("dataset/mnist_train.csv")
trainLabels = encode(trainLabels)

testData,testLabels = load_dataset("dataset/mnist_test.csv")
testLabels = encode(testLabels)

trainData = trainData.astype("float32")
testData = testData.astype("float32")

X_train,Y_train,X_test,Y_test = trainData,trainLabels,testData,testLabels

X_train /= 255
X_test /= 255

We then load our dataset from disk and encode, convert them to float32. We refactor the names of our train and test sets. And finally normalize our train and test data should be in between (0,1).

if K.image_dim_ordering() == 'th':
    X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
    X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
    X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

One thing to note is that the keras is built upon tensorflow and theano. That mean keras use either tensorflow or theano as backend. It’s a great thing to provide flexibility and compatibility with those packages, but unfortunately there are some problems with that. Utilizing convolution2D is one of them. Tensorflow and Theano expects the input data in different formats i.e., tensorflow expects to be of form (n_samples,n_rows,n_cols,n_channels) where as theano expects as (n_samples,n_channels,n_rows,n_cols). So to make it compatible we use the above snippet.

model = Sequential()

model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
                        border_mode='valid',
                        input_shape=input_shape))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

Now we build our model with two convolution layers and then we flatten the output of $2^{nd}$ convolution layer and we attach that to the fully connected layer of 128 nodes which then fully connected to 10 nodes (no.of.classes). We use Dropout as a regularizer to the neural nets. Remember that have used 32 channels for each convolution layer output.

model.compile(loss='categorical_crossentropy',
              optimizer='adadelta',
              metrics=['accuracy'])

model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
          verbose=1, validation_data=(X_test, Y_test))
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

print("[INFO] saving model to disk...")
model.save("output/mnist_cnn.model")

We have our model ready, so we need to compile and fit with training data. And finally save the trained classifier which is obviously more accurate than our previous model. Let’s have a look at the console output below.

screenshot-from-2017-01-02-19-09-26

We can see that we achieved 99% accuracy on the testing dataset which definitely decreases the misclassified digits. Now let’s plug our trained model into the recognize.py script. We only need to change one line i.e., that we have to reshape our digits data to be able to work with convolutions so therefore reshape the data based on the backend your keras is running on. Let’s check our model by running recognize.py again.

screenshot-from-2017-01-02-19-18-42

We can see that our model performed well. Keeping that in mind let’s also test for another image and verify it’s accuracy. We can see the result below.

recognized

And finally we have a model that recognizes handwritten digits perfectly. You can pull the code from my github and play around. Note that I’ve also included train_tf.py and train_tf_cnn.py which are tensorflow implementation of the same models we have built.

In the next post we extend the same idea and develop a project that solves the Sudoku from the live stream of video!!

Thank you, Have a nice day…

  • Pavan Kancharala

    Hi Hack,

    Thank you very much for this example. it worked well for me.

    But How can we print these labels and store in sequence order

    please help me

    thanks
    Pavan kancharala
    email: kvnp89@gmail.com

  • pseudo oduesp

    hello thx for share,
    but why you don’t explain in readme how we can use it in command line , that the basic stuff
    i don’t understand you made a beautuful project but for adding one line it’s hard ???