Artificial Neural Network and it’s contribution to Machine Learning — A beginner’s hand-book

Ronik Basak
Good Audience
Published in
8 min readSep 15, 2018

--

Artificial Neural Networks(ANN) is a computational nonlinear model which is widely used in Machine Learning and is considered to be a prominent component of futuristic Artificial Intelligence.

The neural part of the name aptly suggests that these are brain-inspired systems which are intended to replicate the way humans learn through their biological nervous system.

Thus, such systems “learn” to perform tasks by considering examples(training data), generally without being programmed with any task-specific rules.

It is composed of a large number of highly interconnected processing elements (neurones) working in unison with their remarkable ability to derive meaning from complicated or imprecise data to solve a variety of problems in the area of pattern recognition, trend detection, classification, prediction, optimisation, visualisation, associative memory and many more.

A trained neural network can be thought of as an “expert” in that specific category of information it has been given to analyse. This expert can then be used to provide projections given new situations of interest and answer “what if” questions.

ANN Architecture Overview:

Sample ANN architecture

ANN consists of multiple layers and often called as MLP (Multi Layer Perceptron). This network architecture includes an input layer, one or more hidden layer(s) and the output layer.

The hidden layers are also known as “Distillation Layer” as it extracts(distills) the important features / patterns from the input layer removing the redundant information and forwards those features to the next layer for further derivation.

Working Principles:

In a typical ANN, the input layer contains input nodes(neurons) that transmits information to the hidden layer(s). If there are multiple hidden layers, every layer accepts the pre-processed information from the previous layer, that information is then processed further with specified logic, an intermediate output is generated and forwarded to the corresponding neuron of next hidden layer. The final hidden layer transmits the data to the output layer.

Every input to a neuron has an amplitude(weight), an activation function (to convert a input signal of a node in a ANN to an output signal), and one output. The weights of the inputs are also known as Synapses. These are the adjustable parameters that convert a neural network to a parameterised system.

The weighted sum of the inputs produces the activation signal that is passed to the activation function to obtain one output from the neuron.

The commonly used activation functions are linear, step, sigmoid, tanh, and rectified linear unit (ReLu) functions.

Linear function:

Step function:

Logistic (Sigmoid) Function :

Tanh Function — Hyperbolic tangent:

Rectified linear units (ReLu) function :

Leaky ReLu function:

ReLu has become the most popular activation function among them. Because it has been able to rectify the Vanishing Gradient problem which was a challenge with Sigmoid and Hyperbolic tangent function. But ReLu should only be applied to the hidden layers. And if the model suffers form dead neurons during training using ReLu, we may use leaky ReLu function in that case.

The weights(synapses) are optimized during the training process.

Below is a flow diagram which simplifies the entire workflow of an Artificial Neural Network where we are trying to build a model which derives the probability of rainfall at a particular geographic location based upon it’s humidity, atmospheric pressure and temperature.

It is possible to make an ANN system more flexible and more powerful by using additional hidden layers. Artificial neural networks with multiple hidden layers between the input and output layers are called deep neural networks (DNNs), and they can model complex nonlinear relationships.

How an ANN learns:

The learning process of a typical Artificial Neural Network consists of the below steps:

  1. Random initialization of the weights / model
  2. Picking appropriate activation function
  3. Feed forward : We pass the input data through the network layer (input -> hidden -> output) and calculate the actual output of the model.
  4. Calculate Loss Function: Loss function is nothing but the absolute error on the performance considering the difference between the desired output and the actual output of our model.

loss = Absolute value of (desired — actual)

5. Our ultimate objective would be to minimize the loss function by optimizing the weights. Generally, we use derivative of the loss function (gradient) to take decision on weight-update. In mathematics, the derivative of a function at a certain point, gives the rate or the speed of which this function is changing its values at this point. Here,

we check the derivative.
- If it is positive, meaning the error increases if we increase the weights, then we should decrease the weight.
- If it’s negative, meaning the error decreases if we increase the weights, then we should increase the weight.
- If it’s 0, we do nothing, we reach our stable point.

6. Back-propagation: The error contribution of each neuron is calculated starting from the gradient of the loss function, and the same is back-propagated to the previous layer(s) and so forth step-by-step based upon the derivative of the function used in every particular layer. The technique of auto-differentiation can be used here.

7. Update and optimize the weights based upon the gradient calculations so far. We should use a small constant as “learning rate” while optimizing the weights.

8. Iterate the same steps from 1–7 till we achieve the specified accuracy i.e. till the model converges(Minimized loss function).

Benefits of putting an ANN into action?

A few advantages of choosing ANN over other ML paradigms are:

  1. Non Linear: ANNs have the ability to learn and model non-linear and complex relationships, which is really important because in real-life, many of the relationships between inputs and outputs are non-linear as well as complex.
  2. Adaptive learning: An ability to learn how to do tasks based on the data given for training or initial experience.
  3. Self-Organisation: An ANN can create its own organisation or representation of the information it receives during learning time. It can infer unseen relationships on unseen data as well, thus making the model “generalize” and predict on unseen data.
  4. Having a distributed memory: The information used by the ANN are distributed throughout the network.
  5. Interpolating: Because of the continuous nature of the ANN, unknown input patterns are processed according to the curve fitted through the trained examples.
  6. Real Time Operation: ANN works in massively parallel distributed structure and computations can be carried out in parallel.
  7. Fault Tolerance: A small amount of signal corruption or a partial destruction of a network has only a limited influence on the quality of the output, because of the distributed information storage and real time computation.
  8. Support Heteroskedasticity: Studies have shown that ANNs can better model heteroskedasticity i.e. data with high volatility and non-constant variance, given its ability to learn hidden relationships in the data without imposing any fixed relationships in the data. This is something very useful in regression analysis like in financial time series forecasting (e.g. stock prices) where data volatility is very high.

ANN can be suitable if:

Where ANN is used, we try to achieve the specified level of accuracy using Machine Learning by back propagating and trying to mimic the known results over and over and over, by adjusting the weights until we come close to the known results.

  1. An ANN can be best fitted if you have a large set (tens of thousands of cases, or more) of data for training and testing, including all of the possible inputs along with the corresponding correct (desired) outputs.
  2. If the input data correlates to the output data in a way that is somewhat linearly separable (is directly or indirectly related to its impact on the output) or that “can be made to be linearly separable” (i.e. via kernel trick) then an ANN may be called for.

The trick is to know the domain of competence (or sweet spot) for every family of machine learning algorithms. It is not always mandatory that we will build a model using only one ML algorithm paradigm. We may include multiple models to design the final ML solution for a particular problem.

Conclusion:

In this blog, I tried to introduce the readers to the high-level architecture and workflow of a basic artificial neural network. There are already plenty of study materials and research papers available online on the same topic. I just tried to jot down the fundamental theoretical aspects which can help a beginner to understand how ANN works and when to choose an ANN in the field of machine learning.

In my next blogs, I will try to illustrate over ANN using further example and to discuss about the classifications and implementations of multiple ANN algorithms.

References:

Connect with the Raven team on Telegram

--

--

Software Development Engineer III @ Swiggy | Ex Software Developer of Lazada(Alibaba Group), Target, Flipkart | Tech Enthusiast | Budding Writer