Introduction To Deep Learning and Neural Nets
If you are into technology or have been following it for the latest techy buzz words then surely you must have heard about Artificial Intelligence (AI), and might have also come across “Deep Learning” on which this post focuses!
Deep Learning is here to stay for at least another couple of years if not the decade. With this world of changing technologies, the newly found domain is of Machine Learning and AI, and to which Deep Learning extensively collaborates a lot. So let us get a little more familiar with what it is, and how has it gained so much attention?
Table of Contents
- What is Deep Learning?
- How does it work?
- Terminologies associated with DL with the help of an example.
- How Deep Learning is better than the “Traditional Learning Approaches”?
- Introduction to Neural Networks
Download the Slides for this article
What is Deep Learning?
The domain of Deep Learning is actually derived from Machine Learning in the year 1986 and was thought of in the 1980s, yeah you read it right!
Before leaping to the former, let us first see a definition of Machine Learning just so you get a gist if you don’t know it already.
“Machine learning is an application of Artificial Intelligence (AI) that provides systems, the ability to implicitly learn and improve from experience without being programmed to do so”.
And how will the systems learn? Of course from the data that is explicitly supplied to them as input.
Now coming to Deep Learning; it is a subfield of ML (as stated already) and it concerns with the algorithms which are inspired by the structure and analogy of the brain. Deep Learning extensively uses Neural Networks which is what imitates a human brain to process the input data and learn patterns for decision making in the future.
Some key points to remember about Deep Learning are:
- It learns by looking at the examples, just like Machine Learning.
- It can directly operate over the inputted images, texts, or even sounds!
- It typically uses various architectures of Neural Networks (it sure might be fiddling your brain up, thinking what is a Neural Net, so we’ll come to that soon)
Workflow: How does Deep Learning work?
As already discussed, a Deep Learning Algorithm or model directly processes on pictures, videos, texts or sounds. The programmer need not manually extract features and supply it to the model but the algorithm learns patterns directly from the data inputted.
A DL model consists of several neurons where each neuron (or perceptron) has some input on the basis of which it generates an output. The interconnection of all these neurons forms a network which is known as a Neural Network, and this interconnection enables the neurons to communicate the data with each other.
The figure shows the similarity between a Biological and an Artifical Neuron (Perceptron)
Neural Networks use neurons which are artificially generated keeping in mind the biological ones, hence the network is also called an Artificial Neural Network (or simply ANN).
Understanding terms related to the Deep Learning
As an input is given to the neuron, it computes its output based on a mapping function. This mapping function is known as an Activation Function. Basically, the goal of this activation function is to map the relation between the input and the output values. Also, with each input parameter is associated one more parameter which is known as a Weight(s). The final optimized value of weight(s) is learned by the model by iterating several times through the data inputted. Weight is just a real number which is multiplied with the scalar inputs and the output result is passed on to the next node via the connection.
Weight tells how much impact that input parameter has on the output. For instance, take the problem statement to find whether a newly launched app will be a success or a failure!
The adjacent figure shows possible parameters of concern, i.e., the problem statement and aim of the application, its UI, and lastly, the background of the crowd engaged with it.
So each parameter will have different intensity with which it will impact the output. Suppose the problem statement is not excellent but its UI is great and the crowd into which it is marketed is of the just right domain, then the app will have obviously gain success.
Sometimes another parameter which comes into play is a Bias Unit. It is denoted by the letter b, and it adds non-linearity in the mapping of the function thus making the model much flexible, adaptive to different situations and hence robust!
Why Deep Learning is better than the “Traditional Learning Algorithms”?
Ok, so Deep Learning is another tech in the market but why does the world pivoting to Deep Learning from the “traditional learning algorithms”?
The answer to this question will be so much clearer by just looking at the adjacent figure.
The graph was provided by one of the leading scientists in the domain of AI, and the founder and director of Google Brain, Andrew Yan-Tak NG.
In this era of Big Data, the scientists have realized that data plays a key role in this data-driven world. And when this huge amount of data is provided to the Deep Learning Algorithm, it not only optimizes its performance with it but also enhances the accuracy level of the system as a whole.
This wasn’t seen in the case of Older Learning Algorithms (as in graph), whose performance saturated after reaching a point. And due to this higher level of optimization, Deep Learning is preferred over the Older Learning Algorithms.
Moreover, Deep Learning approach can be applied to Classification as well as Regression tasks, which proves its versatility and hence it becomes handy in the field of AI.
Introduction to Neural Networks
What you’ve have seen until now is just a single neuron and how does it work, but Deep Learning is not a single neuron but a network of neurons. This network of neurons, when plotted, looks something like the figure below.
In the above figure/graph, each node depicts a neuron and several neurons placed vertically makes up a layer. Furthermore, a neural network having more than two layers is used in deep learning architectures.
The very first layer which passes-on the input variables/parameters is known as an Input Layer, and then comes a number of Hidden Layers. Hidden layers got their name as they are generally not represented when dealing with the client architecture. Also, the decision to choose the number of hidden layers is completely on the programmer based on his requirements.
The number of nodes (or neurons) also depends upon the programmer and the use case of the model. Lastly, there is the Output layer, which is dedicated to serving the output of the model. The number of nodes at the input and output layers are determined by the problem statement, for example, if it’s a classifier classifying three classes (say cat, fish, and dog) then the output layer will have 3 nodes for each class. And, the number of nodes in the input layer is the same as the number of input parameters (refer to the example of the success/failure of the new application).
What happens in a neural network is, the input is given to the input layer of the network. Now, this layer multiplies the scalar input values with the weights (which are randomly initialized initially), apply the activation function and pass on the result to the 1st hidden layer via the connection.
Now, this layer does the same thing until this whole process reaches to the output layer, which compares the result based on the cost function and iterates the process again and again until the cost function is minimized.
The reason why Deep Learning is used must be quite clear to you from this post also, it can be customized a lot on the basis of the need of the programmer. It is one of the state-of-the-art techniques which are being used today; it also gives rise to CNN’s, RNNs, LSTMs, etc. which are indeed used in almost everywhere.
If you wish to know more about the mathematical part of the neural networks or how do they work then leave us a comment. Happy Learning!