**Perceptron** is the most basic and primary implementation of a **biological neuron** in machine intelligence. Moreover the concept of perceptron can be leveraged to build more complex neural networks which we will see later. These are used mainly for** supervised learning** and can be modified to work with **unsupervised** **learning** also. The implementation of perceptron is inspired from the actual neuron which can be seen below.

The way it works is that it receives inputs through **Dendrites** and the corresponding **Synapses** required to stimulus the inputs are also received. Then the necessary **activation** is produced in the **nucleus** of the neuron and the final output is passed through **Axon**.

Similarly we can build our perceptron by certain inputs and with some associated weights and there by passing it through the activation function gives the output. But the catch here is that **we don’t know what would be weights** associated with the inputs. And this is where the actual concept of perceptron lies in.

Let’s say we have our inputs as $ X_0,X_1,X_2,….,X_m$ with $n$ features, i.e the dimensions of our input matrix would be $m \times n$ where $m$ represents ** no.of.samples** and $n$ represents

**. Therfore the weights vector contains array of $n$ elements. We then find their sum of their products as $${Z = W^T \cdot X}$$. And then the $Z$ is passed through an**

`no.of.features`

**activation**function which is a

**Unit Step function**in case of a perceptron.

Anyway we can change the **threshold** and the **range** of the step function. All we need to have a Unit Step function as our activation function.

So combining everything,

$${Z = W^T \cdot X}$$

$${\bar Y = g(Z)}$$

$${g}$$ is the activation function.

Therefore $${\bar Y}$$ gives the corresponding output.

### How do we know the correct weights ?

Yeah! That’s an very interesting question. We actually don’t know the exact weights suitable for any experiment before as it is very difficult to make an educated **guess**. But we will learn the correct weights by **training** our perceptron which we will see later. For now we initialize the weights to be all **zeros** or **uniformly distributed**. It’s worth noting that whatever the initial weights might be, after required training we all have the same weights no matter the initial weights.

### What is training?

Every neural network needs a set of weights to predict/classify the output. But we are initially left with all zeros or uniformly distributed weights which we start with. The output thus provided with these may not be correct and hence therefore is the need for training. In training we propagate our input and weights through an activation function and the output through the activation function is used to update the weights by measuring the error or **Cost**.

### How is Cost calculated?

The cost is nothing but how **far** our predicted output $ \bar Y$ from the actual output $Y$. Therefore it can be defined as **Ordinary Least Square** method or sometimes using **Cross Entropy**. For now, we look at how Ordinary Least Square method is implemented.

$${Cost=\frac{1}{2m}\sum_{i=0}^{m}(Y-\bar Y)^2}$$

### How do we update our weights?

I’m glad you asked. Basically the weights are updated by computing the **partial derivative** of the **Cost** wrt **weights** i.e,

$${\Delta W = \frac{\partial}{\partial W}Cost}$$

Therefore,

$${\Delta W = ( Y- \bar Y) \cdot X}$$

Finally,

$${W = W – \alpha \Delta W}$$

We iterate through this process until a **stop criterion** is reached or after some specified **iterations**.

Where ${\alpha}$ is the **learning rate** which controls how fast/slow our perceptron should learn. It is an** hyper parameter** that needs to be manually set and if it too large then our model may skip some points and may not converge, if it is small then it may converge but slowly.

### Can you guide me through an example?

Okay, Let’s code our Perceptron by hand and will see how can we classify the famous **Iris** dataset of two classes since we involve only one output neuron.

from sklearn.datasets import load_iris import numpy as np from sklearn.cross_validation import train_test_split import matplotlib.pyplot as plt

We first start off by importing necessary packages. We load our iris-dataset from ** sklearn.datasets** and

**for numerical computations and**

`numpy`

**for splitting our data in to training and testing sets,**

`train_test_split`

**for plotting the errors.**

`matplotlib`

iris = load_iris() idxs = np.where(iris.target!=2)[0] X = iris.data[idxs] Y = iris.target[idxs]

**Line 8:** We are only interested in the **2** targets as we have only one output neuron for now, so we are slicing the indexes where the target is either **0** or **1** but not **2**

**Line 9-10:** We load the datasets with the corresponding indexes with labels 0 and 1.

X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.1,random_state=1) plt.scatter(X_train[Y_train==1][:,1],X_train[Y_train==1][:,2]) plt.scatter(X_train[Y_train==0][:,1],X_train[Y_train==0][:,2],color='red') plt.show()

We then split our dataset into training and testing sets and we plot a scatter plot of two features of two targets to visualize.

W = np.zeros((X_train.shape[1])) learning_rate = 0.1 errors = []

We then initialize our weights to be all zeros with dimensions equal to ** no.of.features**, and errors to save all the errors through all the iterations it passes on.

for i in xrange(100): Z = X_train.dot(W) a = np.where(Z>=0.5,1,0) error = 0.5*np.mean((Y_train-a)**2) W = W - learning_rate*(a-Y_train).dot(X_train) errors.append(error)

We first find the value of $Z$ which is then passed to the activation function. Here the activation is a step function which is thresholded by ** 0.5**. Then we calculate the error/cost as we discussed before and then update our weights.

plt.plot(errors) plt.show() Z = X_test.dot(W) a = np.where(Z>=0.5,1,0) #accuracy print np.mean(np.where(Y_test==a,1,0))

Then we plot our cost vs no.of.iterations and measure the accuracy of our model by using the updated weights. The **accuracy** measured on our test set is ** 100%** which is expected.

From the above plot we can say that our perceptron converges before ** 20** iterations. And you can also observe that our cost function is fluctuating through some distance before converging and that can be managed with

**momentum**which we will see later.

Let’s see the **Class based implementation** of our **Perceptron** algorithm.

import numpy as np class Perceptron(object): def __init__(self,learning_rate=0.1,n_iter=100,step_threshold=0.5): self.learning_rate = learning_rate self.n_iter = n_iter self._costs = [] self._step_threshold = step_threshold def fit(self,X,y): self._W = np.random.uniform(size=(X.shape[1],)) for i in xrange(self.n_iter): z = X.dot(self._W) a = self._activation(z) self._costs.append(self._cost(y,a)) w_update = (y-a).dot(X) self._W = self._W + self.learning_rate*w_update def _activation(self,z): return np.where(z>=self._step_threshold,1,0) def _cost(self,y_true,y_pred): return 0.5*np.mean((y_pred-y_true)**2) def predict(self,X): return self._activation(X.dot(self._W)) def score(self,y_true,y_pred): return np.mean(y_true==y_pred)

This is our Perceptron class which should be familiar to you if you have followed along. Therefore we can simply rewrite our above script as below.

from perceptron import Perceptron from sklearn.metrics import accuracy_score from sklearn.datasets import load_iris import numpy as np from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt iris = load_iris() idxs = np.where(iris.target!=2)[0] X = iris.data[idxs] Y = iris.target[idxs] X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.3,random_state=1) pcpt = Perceptron(n_iter=20) pcpt.fit(X_train,Y_train) y_pred = pcpt.predict(X_test) print "Accuracy:",pcpt.score(y_pred,Y_test) plt.plot(pcpt._costs) plt.show()

Therfore, we can say our class based perceptron is much more **pythonic** and can be reused anywhere in our scripts.

Till now we have seen a simple perceptron algorithm and in the upcoming posts we will see how this simple perceptron can be leveraged to extend to **Multi-layer-Perceptron** which is where our journey to actual neural networks start. Then we will use that neural network to recognize **handwritten digits** in the later posts.