Code a Neural Network

Though the underlying concepts that are required to build and train a neural network are difficult, It is very easy to implement it in code. So in this post, let’s

  • Code a neural network by hand
  • Use keras to build a neural network

Firstly let’s see how can we build our own neural network with just raw python code. For this let’s assume our task is to build a model that just XOR the input. It might seem very easy but believe me, it is the first difficult step in training any neural network as the XOR itself is not linear i.e it is non-linear.

xor-graph

So our model should be able to classify things non-linearly which is not possible by simple Perceptron. But this is where the Multi Layer Perceptron comes into handy. These are especially suited for such problems indeed.

Neural network by hand

import numpy as np 
import matplotlib.pyplot as plt

X = np.array([ [0,0,1],[0,1,1],[1,0,1],[1,1,1] ])
y = np.array([[0,1,1,0]]).T

Before starting off, let’s import necessary libraries. $X$ is our input with no.of.features=3 and no.of.samples=4 and $y$ is the output. We need to build a model that is capable of predicting the output given the input.

#input features --> 3, hidden_units --> 5, output_units -->1
(D,M,K) = (3,5,1)

#initialize weights
w1 = np.random.uniform(size=(D,M))
w2 = np.random.uniform(size=(M,K))

learning_rate = 1
cost = []

We then initialize our weights $w1$ and $w2$, learning_rate=1 and an empty cost list to collect the cost at each iterations which helps us visualize the cost function.

Let’s create an activation function to carry out the operations, here sigmoid.

def sigmoid(z,deriv=False):
    if deriv:
        return sigmoid(z)*(1-sigmoid(z))
    return 1.0/(1+np.exp(-z))

Note that we don’t use softmax as our task is just binary classification where sigmoid is just enough to characterize the output.Then we iterate through 5000 times to train our model. And the training is followed as shown below.

Feedforward

$z1 = X \cdot w1$

$a1 = g(z1)$

$z2 = a1 \cdot w2$

$a2 = g(z2)$

$cost = \frac{1}{2N} \sum_{i=1}^{N}(a_2-y)^2$

Backpropagation

$\delta_3 = (a_2-y)$

$\delta_2 = \delta_3(g(\bar z_2))$

$\delta_1 = \delta_2 \cdot {w_2}^T g(\bar z_1)$

$\Delta w_2 = a_1 \cdot \delta_2$

$\Delta w_1 = X^T \cdot \delta_1$

$w1 = w1 – \alpha \Delta w_1$

$w2 = w2 – \alpha \Delta w_2$

Though the equations are available in the previous post, I mentioned them here as a matter of completeness.

for i in xrange(5000):
    #feedforward
    z1 = X.dot(w1)
    a1 = sigmoid(z1)
    z2 = a1.dot(w2)
    a2 = sigmoid(z2)

    #append cost
    cost.append(0.5*np.mean((a2-y)**2))

    #backpropagate
    delta_3 = (a2-y)	#shape: (N,K)
    delta_2 = delta_3*sigmoid(z2,deriv=True)	#shape: (N,K)
    w2_delta = a1.T.dot(delta_2)	#shape: (M,K)
    delta_1 = delta_2.dot(w2.T)*sigmoid(z1,deriv=True)		#shape: (N,M)
    w1_delta = X.T.dot(delta_1)		#shape: (D,M)

    w1 = w1 - learning_rate*w1_delta
    w2 = w2 - learning_rate*w2_delta

The code follows the equations I’ve mentioned before, so it wouldn’t be a great deal of explaining each of them.

plt.plot(cost)
plt.show()

print a2

Finally let’s plot the cost function and also print the output predicted output for our inputs to the console.

mlp_costWe can see that our cost is reduced to $0$ which means our model is performing so accurate. Let’s peak into the console for the output.

$ python mlp.py

[[ 0.01839197]
 [ 0.98414062]
 [ 0.98608678]
 [ 0.01521415]]

We can see that our model is performing well. Let’s do the same using keras in the next section.

Code with keras

Keras is one of the best deep learning library out there which makes building the most complex neural networks in an easy and an intuitive way. Keras primarily relies on either Theano or Tensorflow for its computation grid. We will go into much of it in the upcoming posts. For now let’s see, how we can actually build the same neural network as above.

from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD

Let’s start off by importing necessary classes. Keras have awesome modularity which makes imports easier. models.Sequential is some type of a rack we need our neural network to be fitted in. layers.Dense allows us to construct interconnected network layers. And let’s choose Stochastic Gradient Descent for our training which is then available through optimizers.SGD

import numpy as np 

X = np.array([ [0,0,1],[0,1,1],[1,0,1],[1,1,1] ])
y = np.array([[0,1,1,0]]).T

Then import numpy and initialize our inputs. Let’s proceed to build the neural network using keras.

model = Sequential()

model.add(Dense(input_dim=3,output_dim=5,activation="sigmoid"))
model.add(Dense(output_dim=1,activation="sigmoid"))

sgd = SGD(lr=1.0)

We create a Sequential model object and then add layers as shown above. input_dim need to be defined only for the input layer which needs no.of.features and there on we need to supply only output_dim for each successive layers which helps us building neural network.

Also the activation function is provided as a keyword_argument but can be further extended with keras.activations. SGD with learning_rate=1.0 is initialized.

model.compile(optimizer=sgd,loss="mean_squared_error")
model.fit(X,y,nb_epoch=5000,batch_size=32)

print model.predict_classes(X)

And the next step is compiling the model, we choose SGD as optimizer and the loss="mean_squared_error", since we are dealing with binary classification only, cross-entropy is not so helpful. And finally let’s print the predicted output to the console.

Epoch 4998/5000
4/4 [==============================] - 0s - loss: 5.6662e-04
Epoch 4999/5000
4/4 [==============================] - 0s - loss: 5.6647e-04
Epoch 5000/5000
4/4 [==============================] - 0s - loss: 5.6633e-04
4/4 [==============================] - 0s

[[0]
 [1]
 [1]
 [0]]

From the above we can say that our keras model is super easy to build and it is very handy in building very complex networks which involves deep layers where building a neural network by hand is very difficult.

In the next post, let’s discuss about the various types of optimization techniques that helps us minimizing the cost effectively. And let’s get our hands dirty with theano and tensorflow. And then let’s head on to recognizing handwritten digits.