Perceptron

Perceptron is the most basic and primary implementation of a biological neuron in machine intelligence. Moreover the concept of perceptron can be leveraged to build more complex neural networks which we will see later. These are used mainly for supervised learning and can be modified to work with unsupervised learning also. The implementation of perceptron is inspired from the actual neuron which can be seen below.

Biological Neuron

Biological Neuron

The way it works is that it receives inputs through Dendrites and the corresponding Synapses required to stimulus the inputs are also received. Then the necessary activation is produced in the nucleus of the neuron and the final output is passed through Axon.

Similarly we can build our perceptron by certain inputs and with some associated weights and there by passing it through the activation function gives the output. But the catch here is that we don’t know what would be weights associated with the inputs. And this is where the actual concept of perceptron lies in.

Perceptron

Perceptron

Let’s say we have our inputs as $ X_0,X_1,X_2,….,X_m$ with $n$ features, i.e the dimensions of our input matrix would be $m \times n$ where $m$ represents no.of.samples and $n$ represents no.of.features. Therfore the weights vector contains array of $n$ elements. We then find their sum of their products as $${Z = W^T \cdot X}$$. And then the $Z$ is passed through an activation function which is a Unit Step function in case of a perceptron.

Unit Step function

Unit Step function

Anyway we can change the threshold and the range of the step function. All we need to have a Unit Step function as our activation function.

So combining everything,

$${Z = W^T \cdot X}$$

$${\bar Y = g(Z)}$$

$${g}$$ is the activation function.

Therefore $${\bar Y}$$ gives the corresponding output.

How do we know the correct weights ?

Yeah! That’s an very interesting question. We actually don’t know the exact weights suitable for any experiment before as it is very difficult to make an educated guess. But we will learn the correct weights by training our perceptron which we will see later. For now we initialize the weights to be all zeros or  uniformly distributed. It’s worth noting that whatever the initial weights might be, after required training we all have the same weights no matter the initial weights.

What is training?

Every neural network needs a set of weights to predict/classify the output. But we are initially left with all zeros or uniformly distributed weights which we start with. The output thus provided with these may not be correct and hence therefore is the need for training. In training we propagate our input and weights through an activation function and the output through the activation function is used to update the weights by measuring the error or Cost.

How is Cost calculated?

The cost is nothing but how far our predicted output $ \bar Y$ from the actual output $Y$. Therefore it can be defined as Ordinary Least Square method or sometimes using Cross Entropy. For now, we look at how Ordinary Least Square method is implemented.

$${Cost=\frac{1}{2m}\sum_{i=0}^{m}(Y-\bar Y)^2}$$

How do we update our weights?

I’m glad you asked. Basically the weights are updated by computing the partial derivative of the Cost wrt weights i.e,

$${\Delta W = \frac{\partial}{\partial W}Cost}$$

Therefore,

$${\Delta W = ( Y- \bar Y) \cdot X}$$

Finally,

$${W = W – \alpha \Delta W}$$

We iterate through this process until a stop criterion is reached or after some specified iterations.

Where ${\alpha}$ is the learning rate which controls how fast/slow our perceptron should learn.  It is an hyper parameter that needs to be manually set and if it too large then our model may skip some points and may not converge, if it is small then it may converge but slowly.

Can you guide me through an example?

Okay, Let’s code our Perceptron by hand and will see how can we classify the famous Iris dataset of two classes since we involve only one output neuron.

from sklearn.datasets import load_iris
import numpy as np 
from sklearn.cross_validation import train_test_split
import matplotlib.pyplot as plt

We first start off by importing necessary packages. We load our iris-dataset from sklearn.datasets and numpy for numerical computations and train_test_split for splitting our data in to training and testing sets, matplotlib for plotting the errors.

iris = load_iris()

idxs = np.where(iris.target!=2)[0]

X = iris.data[idxs]
Y = iris.target[idxs]

Line 8: We are only interested in the 2 targets as we have only one output neuron for now, so we are slicing the indexes where the target is either 0 or 1 but not 2

Line 9-10: We load the datasets with the corresponding indexes with labels 0 and 1.

X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.1,random_state=1)

plt.scatter(X_train[Y_train==1][:,1],X_train[Y_train==1][:,2])
plt.scatter(X_train[Y_train==0][:,1],X_train[Y_train==0][:,2],color='red')
plt.show()

We then split our dataset into training and testing sets and we plot a scatter plot of two features of two targets to visualize.

Scatter plot of two different targets.

Scatter plot of two different targets.

W = np.zeros((X_train.shape[1]))

learning_rate = 0.1
errors = []

We then initialize our weights to be all zeros with dimensions equal to no.of.features, and errors to save all the errors through all the iterations it passes on.

for i in xrange(100):
    Z = X_train.dot(W)
    a = np.where(Z>=0.5,1,0)

    error = 0.5*np.mean((Y_train-a)**2)

    W = W - learning_rate*(a-Y_train).dot(X_train)
    errors.append(error)

We first find the value of $Z$ which is then passed to the activation function. Here the activation is a step function which is thresholded by 0.5. Then we calculate the error/cost as we discussed before and then update our weights.

plt.plot(errors)
plt.show()


Z = X_test.dot(W)
a = np.where(Z>=0.5,1,0)

#accuracy
print np.mean(np.where(Y_test==a,1,0))

Then we plot our cost vs no.of.iterations and measure the accuracy of our model by using the updated weights. The accuracy measured on our test set is 100% which is expected.

figure_3

From the above plot we can say that our perceptron converges before 20 iterations. And you can also observe that our cost function is fluctuating through some distance before converging and that can be managed with momentum which we will see later.

Let’s see the Class based implementation of our Perceptron algorithm.

import numpy as np

class Perceptron(object):
    def __init__(self,learning_rate=0.1,n_iter=100,step_threshold=0.5):
        self.learning_rate = learning_rate
        self.n_iter = n_iter
        self._costs = []
        self._step_threshold = step_threshold


    def fit(self,X,y):
        self._W = np.random.uniform(size=(X.shape[1],))
        
        for i in xrange(self.n_iter):
            
            z = X.dot(self._W)
            a = self._activation(z)
            self._costs.append(self._cost(y,a))

            w_update = (y-a).dot(X)
            self._W = self._W + self.learning_rate*w_update

    def _activation(self,z):
        return np.where(z>=self._step_threshold,1,0)

    def _cost(self,y_true,y_pred):
        return 0.5*np.mean((y_pred-y_true)**2)

    def predict(self,X):
        return self._activation(X.dot(self._W))

    def score(self,y_true,y_pred):
        return np.mean(y_true==y_pred)


This is our Perceptron class which should be familiar to you if you have followed along. Therefore we can simply rewrite our above script as below.

from perceptron import Perceptron 
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris
import numpy as np 
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

iris = load_iris()

idxs = np.where(iris.target!=2)[0]

X = iris.data[idxs]
Y = iris.target[idxs]

X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.3,random_state=1)

pcpt = Perceptron(n_iter=20)
pcpt.fit(X_train,Y_train)

y_pred = pcpt.predict(X_test)
print "Accuracy:",pcpt.score(y_pred,Y_test)
plt.plot(pcpt._costs)
plt.show()

Therfore, we can say our class based perceptron is much more pythonic and can be reused anywhere in our scripts.

Till now we have seen a simple perceptron algorithm and in the upcoming posts we will see how this simple perceptron can be leveraged to extend to Multi-layer-Perceptron which is where our journey to actual neural networks start. Then we will use that neural network to recognize handwritten digits in the later posts.

 

  • Thanks for sharing excellent informations. Your website
    is so cool. I am impressed by the details that you’ve on this website.

    It reveals how nicely you understand this subject.

    Bookmarked this web page, will come back for more articles.
    You, my friend, ROCK! I found just the info I already searched all
    over the place and just couldn’t come across. What a perfect website.