Intro to Neural nets

I know that there are plenty of Machine learning algorithms out there for prediction and classification tasks, but why then we need Neural networks even?

Why Neural nets?

The basic idea of implementing neural networks is to mimic human brain or simply develop a model that works like a human brain. In Machine learning techniques we depend on high level statistics and do not implement how actually a human brain works on the problem. And lots of problems that are difficult for machine learning models can be solved easily by Deep learning which involves neural networks.

How they actually work ?

I know the term Neural networks sounds strange but it is very easy than typical machine learning methods we follow. You don’t believe me?

Okay, then consider a situation that you are going to a movie with your friends by a car. How would you control the speed of the vehicle? It depends on various factors like Temperature,Traffic density,Remaining Time. This task can be emulated as below.


Let’s say the Temperature, Traffic Density and Time remaining acts like neurons in the brain that decide how the speed to be maintained by you.

Seems easy right? Okay not so fast, there is more. From the above shown image we can say that the speed to be maintained depends more on Time remaining and Traffic Density and the very less on Temperature. So how would we show that behavior? We assign some weights associated to each neuron, and there by we can say how much the each neuron contribute to the overall progress of the output neuron.

If we say $ X_1,X_2,X_3$ be the inputs then $ W_1,W_2,W_3$ are the weights associated to each of them respectively. Moreover we need to make sure that all the inputs are feature scaled for better performance as the units of time and temperature are not equal. So to feature-scale the data we use two methods as follows.


$$ X=\frac{X-\mu}{\sigma}$$

$\mu – mean, \\ \sigma – standard \ deviation$

Min-Max Scaling:

$$X = \frac{X-X_{min}}{X_{max}-X_{min}}$$

How to predict/classify?

Now that we have our data ready, and we simply need to multiply the input feature and the weight associated with it and pass it to an activation function (g).


$$Z = X_1W_1+X_2W_2+X_3W_3$$


$$Z = \sum_{i=1}^{n}X_iW_i$$

then applying activation gives output,

$$ O = g(Z) $$

What is activation function?

Yeah ! I know the term activation function is somewhat new to you, let me explain in an intuitive way. The value of $Z$ tends to be anything between $-\infty$ to $+\infty$ which is not good to proceed as we don’t know what would be the resulting output values range between. It would have been better if the output is scaled between known limits. There comes the activation function.

The activation function simply takes a value and outputs a value that is in a predetermined limit. We use several activation functions and again it is not trivial to select particular activation function for a particular task, it can be any.


The sigmoid non-linearity ranges between 0 and +1, where as the hyperbolic tangent is in between -1 and +1

The above table is enough to understand different types of activation functions. Though it is good to remember that we also use another function called ReLu which is short for Rectified Linear units which gives great performance in image classification and processing techniques. It is simply defined as $max(0,x)$.

Which activation function to use?

It’s purely depends on your application and the purpose of using neural networks. For classification tasks we mainly use Sigmoid,Hyperbolic tangent and ReLu where as in the context of regression analysis we use Linear activation.

Is that all about neural networks?

No, there is more. We will first dive into the native perceptron algorithm which mimics a single layer neuron as shown above in the later post. And in the series of this posts we will discuss the best practices and even feature extraction with some examples.