This post is going to be a **long** one and you may ask why don’t I split this up? And yes there is a reason for that, it would be better to complete all the details in this post at once rather than taking a break as you may *get lost* overtime. So getting to the point, in the last post we were able to segment the individual digits from the image. In this post we will develop a model to recognize those individual digits.

Before actually building our model we need data. I used the most popular **MNIST** dataset which includes *60,000* training images and *10,000* testing images of digits from `0-9`

. You can download the dataset from this * link*. These files are in

`.csv`

format and we need to make them available for our training and testing operations. Before continuing any further let’s review our project structure below.DigitRecognizer ├── dataset/ │ ├── mnist_test.csv │ ├── mnist_train.csv ├── output/ ├── recognize.py ├── train_keras_cnn.py ├── train_keras.py └── utils/ ├── dataset.py ├── __init__.py

– It contains of two files namely mnist_train.csv and mnist_test.csv which are downloaded from the link above mentioned.`dataset/`

– This directory is used to save the trained models. It’s always better to have all the trained models at one place!!`output/`

– The driver script which takes an input and recognize the digits.`recognize.py`

– Trains a simple`train_keras.py`

using keras*Multi-Layer-Perceptron*– Trains a`train_keras_cnn.py`

model which we will discuss in the next post using keras.**Convolutional Neural Network**– It consists of all the utilities that are necessary to load the dataset and processing.`utils/dataset.py`

Now that we have an idea of what we will be doing right? Okay, let’s first open up `utils`

package and code in `dataset.py`

file.

import numpy as np import matplotlib.pyplot as plt def load_dataset(path,delimiter=","): arr = np.loadtxt(path,delimiter=delimiter,dtype="uint8") labels = arr[:,0] data = arr[:,1:] return data,labels

We start off by creating a function called `load_dataset`

which is used to load our dataset which is in `.csv`

from disk and returns them as *numpy* array. `np.loadtxt`

is used to actually load the dataset from disk. We * sliced* the first column and named it as

*labels*and the remaining from second column as

*data*, since our dataset consists of data in a format that the first element is the label and the remaining

*784*elements followed are the pixel values for each digit. We can actually resize the

**pixel values to**

*784*`28x28`

image which represents an image which we will see in a while. And finally returns the data and labels.def plot_dataset(data,labels,predictions=None,annotate=True): assert len(data)==len(labels) == 16 for i in xrange(len(labels)): plt.subplot(4,4,i+1) plt.axis("off") plt.imshow(data[i],cmap="gray") title = "True: {}".format(labels[i]) if predictions is not None: title += " Predicted: {}".format(predictions[i]) if annotate: plt.title(title) plt.show()

`plot_dataset`

is used to plot the digits with their true labels and predicted labels (optional). It plots a `4x4`

grid and therefore it needs `16`

images. The code itself is self explainatory.

def encode(y): Y = np.zeros((y.shape[0],len(np.unique(y)))) for i in xrange(y.shape[0]): Y[i,y[i]] = 1 return Y

`encode`

encodes the numerical input to **categorical_form** which is well suited for neural networks. It means our dataset consists of only true class label but we need to make it **one-hot encoding**. For example, `3`

needs to be encoded as ** 0001000000** and

`7`

needs to be encoded as **(remember we start from 0).**

`0000000100`

Now that we have required utilities to handle the dataset, let’s visualize our dataset and see how the training and testing images look like…

### Multi Layer Perceptron

Now let’s get our hands dirty with building a simple **Multi-Layer-Perceptron** in ** keras** and train it against the

**dataset. Open up**

*mnist*`train_keras.py`

and start coding.from __future__ import print_function from keras.models import Sequential from keras.layers import Dense from utils.dataset import load_dataset,encode

Let’s start off by importing necessary packages. We are using keras to build this simple network, I’ve also included the tensorflow version of the same MLP in the link that is available below pointing to my github repository. For making things simpler let’s go with keras.

trainData,trainLabels = load_dataset("dataset/mnist_train.csv") trainLabels = encode(trainLabels) testData,testLabels = load_dataset("dataset/mnist_test.csv") testLabels = encode(testLabels)

We then load our training and testing datasets using `utils.dataset.load_dataset`

utility function and then `encode`

the corresponding labels such that it is suitable for neural network propagation.

trainData = trainData.astype("float32") testData = testData.astype("float32") trainData /= 255 testData /= 255

It is always better to convert our dataset to `float32`

, since our dataset is in `uint8`

format that ranges from ** (0,255)**, and then normalize the dataset by dividing it with 255 to make it range in between

**. Always normalize the data before training a model.**

`(0,1)`

model = Sequential() model.add(Dense(input_dim=784,output_dim=256,activation='relu',init="normal")) model.add(Dense(output_dim=256,activation='relu',init="normal")) model.add(Dense(output_dim=10,activation="softmax"))

Let’s start building our model by making it * sequential* model. Our model consists of

*4 layers*, i.e., Input Layer has

`784`

nodes and the two hidden layers has `256`

nodes each and finally the output layer with `10`

nodes since we need to classify between *. I’ve used*

**0 to 9**`relu`

as activation function and `softmax`

at the output layer as it has more than *2*classes it is better to use softmax for the final layer.

model.compile("adam","categorical_crossentropy",metrics=["accuracy"]) model.fit(trainData,trainLabels,batch_size=100,nb_epoch=25,verbose=2,validation_data=(testData,testLabels))

Now that we have our model ready and we need to compile it. I’ve used Adam Optimizer, and the cross entropy as the cost function. Then we fit our model with *trainData* and *trainLabels* with `batchsize`

of 100 and `no.of.iteration`

is *25*. This step may take a long time as it iterates through 25 times through each batch of size 100 i.e., 600*25 generic iterations as our training set consists of 60,000 images.

print(model.summary()) score = model.evaluate(testData,testLabels) print('Test cost:', score[0]) print('Test accuracy:', score[1]) model.save("output/mnist.model")

We then print our model summary to the console. Then find the cost and accuracy of the model and finally save the trained model to disk. The training output in the console may look like this.

We can achieve the accuracy upto which seems to be good.

`96%`

So we have our model ready. Let’s plug in this model into the driver script and recognize the digits. For that we need to tweak our `recognize.py`

file a little to actually make it work.

import numpy as np import cv2 import imutils import argparse from keras.models import load_model #parse arguments ap = argparse.ArgumentParser() ap.add_argument("-i","--image",required=True,help="Path to image to recognize") ap.add_argument("-m","--model",required=True,help="Path to saved classifier") args = vars(ap.parse_args())

We just added the two lines which are highlighted in the `recognize.py`

. We will use `keras.model.load_model`

to load our trained classifier from the disk and added argument parser to take in path to the saved model.

Let’s add these lines to the `recognize.py`

at the bottom

digits = np.array(digits) model = load_model(args["model"]) labels = model.predict_classes(digits.reshape(-1,784))

We need to convert the digits list to *numpy* array to make the computation on all the digits at once rather than looping over each of them. Then we load our saved model using load_model method and finally we attempt to recognize the digits using `model.predict_classes`

method. Remember we always need to *reshape* our dataset to match the training data shape.

cv2.imshow("Original",image) cv2.imshow("Thresh",thresh) for (x,y,w,h),label in sorted(zip(boxes,labels)): cv2.rectangle(image,(x,y),(x+w,y+h),(0,0,255),1) cv2.putText(image,str(label),(x+2,y-5),cv2.FONT_HERSHEY_SIMPLEX,1.0,(0,255,0),2) cv2.imshow("Recognized",image) cv2.waitKey(0) cv2.destroyAllWindows()

Finally we are good to go and loop over the sorted bounding boxes (sorted) and label (corresponding) so that to help us draw the bounding box on the image with labels on them from left to right.

As you can see that the 4th digit was misclassified as 9, and yes that would be possible since we just got * 96%*. We can actually increase the accuracy by using

**which we will discuss later in this post. The output may look like something like this for other image where the digits are written much more**

*Convolutional neural networks***closer**.

And again the 6th digit was misclassified. We can actually make this recognize correctly by using Convolutional Neural Networks which increases our model accuracy upto **99%**.

### Convolutional Neural Network

If you have no idea of what exactly a Convolutional Neural Network (CNN) do, please refer this *link* and come back.

So let’s start building our * CNN* model using keras. Open up the file named

`train_keras_cnn.py`

and start coding.from __future__ import print_function import numpy as np np.random.seed(1337) # for reproducibility from keras import backend as K from keras.models import Sequential from keras.layers import Dense, Dropout, Activation, Flatten from keras.layers import Convolution2D, MaxPooling2D from utils.dataset import load_dataset,encode

Let’s start off by importing all the necessary packages required to build our model. We use *Sequential* model for building our model.

batch_size = 128 nb_classes = 10 nb_epoch = 12 # input image dimensions img_rows, img_cols = 28, 28 # number of convolutional filters to use nb_filters = 32 # size of pooling area for max pooling pool_size = (2, 2) # convolution kernel size kernel_size = (3, 3)

We define some parameters like `no.of.iterations`

, `batch_size`

, `no.of.classes`

and then `no.of.rows`

, `no.of.cols`

of image array and `no.of.filters`

for convolutional neural network with `kernel_size`

**(3×3)**. And we then define the **maxpooling** size to be exactly half of the image, therefore it is **(2×2)**.

trainData,trainLabels = load_dataset("dataset/mnist_train.csv") trainLabels = encode(trainLabels) testData,testLabels = load_dataset("dataset/mnist_test.csv") testLabels = encode(testLabels) trainData = trainData.astype("float32") testData = testData.astype("float32") X_train,Y_train,X_test,Y_test = trainData,trainLabels,testData,testLabels X_train /= 255 X_test /= 255

We then load our dataset from disk and encode, convert them to `float32`

. We *refactor* the names of our train and test sets. And finally normalize our train and test data should be in between `(0,1)`

.

if K.image_dim_ordering() == 'th': X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols) X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols) input_shape = (1, img_rows, img_cols) else: X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1) X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1) input_shape = (img_rows, img_cols, 1)

One thing to note is that the keras is built upon *tensorflow* and *theano*. That mean keras use either tensorflow or theano as * backend*. It’s a great thing to provide flexibility and compatibility with those packages, but unfortunately there are some problems with that. Utilizing

`convolution2D`

is one of them. Tensorflow and Theano expects the input data in different formats i.e., tensorflow expects to be of form **where as theano expects as**

`(n_samples,n_rows,n_cols,n_channels)`

**. So to make it compatible we use the above snippet.**

`(n_samples,n_channels,n_rows,n_cols)`

model = Sequential() model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode='valid', input_shape=input_shape)) model.add(Activation('relu')) model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1])) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=pool_size)) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(128)) model.add(Activation('relu')) model.add(Dropout(0.5)) model.add(Dense(nb_classes)) model.add(Activation('softmax'))

Now we build our model with two convolution layers and then we flatten the output of $2^{nd}$ convolution layer and we attach that to the fully connected layer of **128** nodes which then fully connected to **10** nodes `(no.of.classes)`

. We use **Dropout** as a regularizer to the neural nets. Remember that have used **32** channels for each convolution layer output.

model.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy']) model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1, validation_data=(X_test, Y_test)) score = model.evaluate(X_test, Y_test, verbose=0) print('Test score:', score[0]) print('Test accuracy:', score[1]) print("[INFO] saving model to disk...") model.save("output/mnist_cnn.model")

We have our model ready, so we need to *compile* and *fit* with training data. And finally save the trained classifier which is obviously more **accurate** than our previous model. Let’s have a look at the console output below.

We can see that we achieved **99%** accuracy on the testing dataset which definitely decreases the misclassified digits. Now let’s plug our trained model into the `recognize.py`

script. We only need to change one line i.e., that we have to **reshape** our digits data to be able to work with convolutions so therefore reshape the data based on the *backend* your * keras* is running on. Let’s check our model by running

`recognize.py`

again.We can see that our model performed well. Keeping that in mind let’s also test for another image and verify it’s * accuracy*. We can see the result below.

And finally we have a model that recognizes handwritten digits perfectly. You can pull the code from my * github* and play around. Note that I’ve also included

`train_tf.py`

and `train_tf_cnn.py`

which are **implementation of the same models we have built.**

*tensorflow*In the next post we extend the same idea and develop a project that solves the **Sudoku** from the live stream of video!!

Thank you, Have a nice day…