What is Machine Learning?

Introduction to Machine Learning

Researchers have long dreamed of building imaginative machines. When the Programmable PCs were first invented people wondered whether such machines one day become intelligent like humans and able to do tasks like humans. Today, Artificial Intelligence is an emerging technology with a wide variety of applications in different fields. The concept of  AI is to simulate human intelligence into artificial machines such that machines are able to think and perform tasks like humans.

Why do we require any technology which works like humans in all aspects?

Humans have very good accuracy of doing work but efficiency toward work is not satisfactory and there is always a limit to speed up the work by humans but this is not the case from machines and also the work done by machines is very precise, uniform, and scalable.

In the nineteenth century, the Software revolution took place to overcome these problems however it is not sufficient enough to deal with these problems. Software is able to perform the task which is formally defined in a set of rules such that it is able to write a program by the programmer by considering those rules.

For example, calculating the sum of two given numbers. In today’s world, in terms of speed and accuracy, computers can beat any human in this task. But the problems which do not have a set of formal rules and require human intelligence then such a problem is very hard to solve by computers.

For example, to recognize faces, humans are able to recognize faces very easily but it is very hard for the computers to recognize because it is very complex to write formal rules of faces. So the true challenge of artificial intelligence is to solve the tasks that are easy to perform by humans but hard for humans to describe formally.

Let’s take an example of the Deep Blue chess-playing system developed by IBM. Chess rules can be completely defined by a set of formal rules. So these rules were easily converted to the program by the programmer and provided ahead of time by the programmer.

Artificial intelligence tries to tackle this challenge by transferring human intelligence to machines with incomparable computational capabilities. 

In everyday life, a human requires the knowledge about the world to solve their task and such knowledge is subjective and intuitive therefore it is difficult for the programmer to articulate in a set of rules.

So from here we are able to understand that, to behave like humans or in other words behave in an intelligent way, computers require similar knowledge  So the key challenge in AI is to put this informal or subjective information into the computer and researchers in the Artificial Intelligence field, basically tries to achieve this objective.

Researchers had found out the very basic way to achieve this objective. They have used a Knowledge-based approach. In this approach researchers hard code the knowledge about the world in formal languages.

Computers can reason automatically about the statements in these formal languages using logical inference rules. Since this is the very basic, simple, and naive approach, the project uses this approach are not successful because researchers struggle to devise formal rules with enough complexities to accurately devise the world. One example of such a project is Cyc. Cyc is an inference engine. 

The difficulty faced by the above projects( based on Knowledge-based approach) is relying on hard-coded knowledge. So to overcome this difficulty, AI systems need the ability to acquire their own knowledge from the world, by extracting patterns from raw data. This capability is known as Machine Learning.

Machine Learning

The introduction of Machine Learning gives computers the ability to acquire the knowledge of the real world and make decisions that appear subjective. In this way, Machine Learning is able to overcome the limitations of the Knowledge-based approach.

According to Wikipedia

Machine Learning is the study of computer algorithms that improves automatically through experience.

According to Mitchell

A computer program is said to learn from experience E with respect to some class of Tasks T and performance measure P, if its performance at task in T, as measured by P, improves with Experience E

There are many types of Machine Learning algorithms that exist in literature. Here grouping of the algorithms is done on the basis of learning style. The broad grouping of algorithms of Machine Learning algorithms is shown in Figure 1. Let’s see in detail one by one.

machine learning
Grouping of Machine Learning Algorithm on the basis of learning style

Supervised Learning

Supervised Learning as the name suggests is the presence of a supervisor as a teacher. In Supervised Learning we train our machine using labeled data. Labeled data means for every input there is a well-labeled output.

In the process of training, machines acquire Knowledge of the world from labeled data. After training, the machine is provided with a new set of data to predict the outcome. The objective is to make machines learn from some similar kind of patterns obtained from the training dataset and apply the learned Knowledge on the dataset tested to predict the real-valued output.

Let’s take an example of an Iris dataset to understand better. The Iris dataset is a collection of measurements of different parts of 150 iris plants. Each example in a dataset consists of measurement of each part of the plant like sepal length, sepal width, petal length, petal Width. The dataset also records which species each plant belongs to. There are three different species present in the dataset. So as we see here in the Iris dataset, each Iris plant is labeled with its species. 

See also  What Is APL*ITUNES.COM/BILL? 3 Best Methods To Stop It

Supervised Learning algorithms can study this dataset and learn to classify Iris plant Into three different species based on their measurements

The term supervised learning basically means that the target y provided by a teacher who shows the machine what to do.

Supervised learning classified into two categories of algorithm as shown in figure:2.

Supervised Learning


Regression algorithms predict the continuous outcome (target)  based on one or more Input or predictor values. In simple words, the output value is a real value like weights.

There are various kinds of regression algorithms. The types of different regression algorithms depend on the number of independent variables, the shape of the regression line, and the type of dependent variable. Let’s see some types of regression techniques.

Linear regression is one of the most basic and popular regression algorithms to predict Continuous value. Here it assumes the linear relationship between input( predictor) and Output.

Linear Regression Algorithm

Linear regression names suggest that it is capable of solving regression problems. The objective of these algorithms is to build a system that can take a vector x and predict the scalar value y as the output. In simple words, this algorithm establishes the relationship between input and output using a best fit straight line.


Here w is the vector of parameters. Parameters are the values that control the behavior of the system.

We can think of ’ w’ as a set of weights that determine how each feature affects the output. The feature is nothing but characteristic of input.

For example

Let’s say we want to have a system that is able to predict the price of used cars. Here features are car attributes that we think affect a car’s worth like brand, year, engine efficiency, capacity, mileage, and many other pieces of information.

    y=w0 * capacity+w1 * mileage +w3 * engine efficiency

if these features receive positive weights wi then increases in these weights increases the value of our prediction increases and vice versa. If weights ‘wi’ is large in magnitude then it has a large effect on prediction. If a weight ‘wi’ is 0 then it has no effect on prediction.


Classification is a supervised learning concept which tries to predict categories which the input belongs to. To solve the classification problem learning algorithms will try to produce the function something like f:R-{1,2,…k}. In simple words, when output is variable is a category like a disease or non-disease i.e in this problem the output is discrete. For example, in the Iris dataset, we have to predict three classes of species given three features (sepal length(sl), sepal width(sw), petal length(pl), petal width(pw)) in an input.

Let’s take another example of object recognition to understand it clearly

Here the input is an image and the output is a numeric code identifying the object in the image. 

There are a number of classification algorithms. Classification Algorithms include support vector machine logistic regression, decision tree, random forest, etc. Let’s see some algorithms in detail.

Support Vector Machine

A support vector machine is a supervised learning algorithm that can be used for both classification and regression problems but mostly it is used for classification problems. 

Given a training dataset, each labeled as  one or the other of two classes, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier.

Basically this algorithm tries to find the optimal hyperplane in an n-dimensional place which classifies new examples. In two dimensional space(when the number of input features is two) this hyperplane is nothing but the line dividing the plane into two parts as shown in the figure3. 

According to wikipedia

“An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on the side of the gap on which they fall.”

svm model

Figure 3

SVM tries to maximize the margin between the two classes. The maximum margin is achieved by the hyperplane that has the largest distance to the nearest training-data point of any class.  

This is very intuitive to understand. We can see in the figure, all the data points that fall on the side of the line will be labeled as one class, and the points that fall on the other side of the line will be labeled as second class. Now as we see in Figure 3, there are an infinite amount of lines passing between them. 

So how do we know which line performs best? This algorithm tries to select a line that not only separates the two classes but stays as far away from the closest samples as possible as shown in Figure 3. 

Unsupervised Learning

In supervised learning, the objective is to learn the mapping from input to output whose correct values are provided by the supervisor. In unsupervised learning, only input data is given and there is no such supervisor. The objective is to find the regularities of the input. 

See also  13 Easy Solutions To Fix:"Apt-Get Command Not Found"

There is a structure to the input space such that some of the patterns occur more than others.

There are two main methods used in unsupervised learning are cluster analysis and Principal Component.

In cluster analysis, the objective is to find the grouping of the input. 

Let’s take an example to understand clearly

All companies have a lot of customer data. The customer data contains demographic information as well as past transaction with the company. The company may be interested to see the distribution of the profile of its company, to see what type of customer frequently occurs. In such scenarios, clustering allocates customers similar in their attributes to the same group. These clustered groups may help in deciding the strategies of the company for example services and products, specific to different groups.

A popular algorithm to do this clustering analysis is K-means clustering. Let’s discuss K-means in more detail.

K-means Clustering

K-means clustering is one of the popular and simplest unsupervised learning algorithms.

K-means is a centroid-based algorithm, where we calculate the distances of given points from centroid to assign a point to a cluster. In K-Means, each cluster is associated with a centroid.

This algorithm works as follows:

  1. Firstly initialize k points randomly called means
  2. After that, categorize each item to its closest mean and update the mean’s coordinates, which are the averages of the items categorized in that mean so far.
  3.  Repeat these steps for a given number of iterations and after the given number of iterations, we have our clusters.

Semi-Supervised Algorithm

In supervised learning, we have seen that the dataset has to be manually labeled by humans. This process is very costly because the volume of the dataset is very large. In unsupervised learning, a labeled dataset is not required but its application spectrum is limited.

To tackle these limitations, the concept of semi-supervised learning is introduced. In this style of learning, the algorithm is trained with a combination of a small amount of labeled data and a large amount of unlabelled data. Semi-supervised learning falls between supervised learning and unsupervised learning.

In order to make any use of unlabelled data, Semi-Supervised algorithm assumes the following relationship about the data –

  1. Continuity : It assumes that the points which are closer to each other are more likely to share the same output label.
  2. Cluster : If data can be divided into discrete clusters then points in the same cluster are more likely to share a label.
  3. Manifold: The data lie approximately on a manifold of much lower dimension than the input space. This assumption allows the use of distances and densities which are defined on a manifold.

We can understand these three types of learning styles namely supervised learning, unsupervised learning, and semi-supervised learning by relating to the real world.

Supervised learning where the student is under the supervision of the teacher. In unsupervised learning where a student has to figure out a concept himself. Semi-Supervised learning is where a teacher teaches a few concepts in class and gives questions as homework that are based on similar concepts.

Reinforcement Learning

Reinforcement learning is learning by interacting with an environment. The learning process involves an actor, an environment, and a reward signal. The actor chooses to take action in an environment for which the actor is rewarded accordingly. Here the output of the system is a sequence of actions.

In such a case, a single action is not important, here a sequence of corrective actions to reach the goal is important. This is also called policy. The actor wants to increase the reward it receives and hence it must learn an optimal and good policy for interacting with the environment. A good example is the games. In game, a single move by itself is not important, it requires a sequence of right moves that is good(i.e moves leads to winning)

`Reinforcement learning is very different from other types of learning that we have covered so far. As we have seen in supervised learning, we are given data and labels and are tasked with predicting output given data. In unsupervised learning, we are given only data and tasked with finding the underlying structure in the data. In reinforcement, we are neither given data nor labels.

Applications of reinforcement learning are

  1. Self-driving car
  2. Robotic motor control
  3. Air conditioning control
  4. Ad-placement optimization
  5. Stock market trading strategies
  6. Game playing

Deep Learning

When we analyze the image of a car, then the individual pixel in the image of the red car is very close to black at night. This example may give you insight into the difficulty faced by many Artificial Intelligent applications. It is very difficult to extract such high level and abstract features because it requires human-level understanding.

Deep Learning tackles this problem by making complex features from simple one. The most basic example of a deep learning model is Multilayer Perceptron. Multilayer Perceptron is just a mathematical function mapping input values to output values. This function is composed of many simpler functions.

Deep learning is a particular kind of machine learning that achieves great power and flexibility by representing the world as a nested hierarchy of concepts. Each concept defined in relation to simpler concepts, and more abstract representations were computed in terms of less abstract ones.

Deep learning algorithms like Deep Neural Network, Deep belief network, Convolutional Neural Network, Recurrent Neural Network have been applied to fields including computer vision, speech recognition, natural language processing, and many more

See also  6 Ways To Call Private On iPhone By Hiding Your Caller ID

Deep Neural Network

The Deep Neural Network is inspired by the function of the human brain and the way it works. The basic building block of deep neural networks is nodes. Nodes are just like the neurons of the human brain. When the stimulus hits them, a process takes place in the node. Generally, nodes are grouped into layers as shown in Figure 6.

Deep Neural Network
Figure 6 : Deep Neural Network

There are different types of deep neural networks and the differences between them lie in their working principles, the scheme of actions, and the areas of applications.

  1. Convolutional Neural Networks (CNN): It is mostly used for image recognition because there is no need to check all the pixels one by one. CNNs consist of an input layer, an output layer, and hidden layers. The hidden layers usually consist of convolutional layers, pooling layers, and fully connected layers. Convolutional and max-pooling layers act as the feature extractor and fully connected layer which performs non-linear transformations of the extracted features and acts as the classifier. Convolutional layers apply a convolution operation to the input. The pooling layer is used immediately after the convolutional layer to reduce the spatial size (only width and height, not depth). This reduces the number of parameters and hence computation is reduced as well as helps make feature detectors more invariant to its position in the input. It is easy  to understand convolution operation with this animation
machine learning images
  1. Recurrent Neural Network(RNN): Recurrent neural networks are a class of neural networks that allow previous outputs to be used as inputs to the model. It was first introduced in the 1980s. RNNs are different from feed-forward neural networks because they leverage a special type of neural network, known as a recurrent layer. The main idea behind the Recurrent Neural Network is to make use of sequential information. In a traditional neural network like CNN, we have assumed that all inputs and outputs are independent of each other. But for many tasks that are not a very good idea to assume this. Suppose, If we want to predict the next word in a sentence then it is better to know which words came before it. RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being dependent on the previous computations. In simple words, RNNs have a memory that captures information about what has been computed so far. In theory, RNNs are able to use information in long sequences, but in practice, they are limited to looking back only a few steps.

Lets see the relationship between AI, Machine Learning and Deep Learning using Venn Diagram

machine learning
Figure 7 : This figure shows the relationship between Deep Learning, Machine Learning and Artificial Intelligence.

Applications of Artificial intelligence

There are many different fields where AI is used. The fields include Marketing, Banking, Finance, Agriculture, Healthcare, Gaming, Space Exploration, Autonomous Vehicles, chatbots, Artificial Creativity, etc.

Let’s explore the Marketing and Banking field.


In the early days(when AI is not in application. It only exists in books), if we want to buy some product from the online store then we have to search the product with its exact name. So it is very difficult to find a product if we don’t know the exact name of the product.

But nowadays when we search for any item on any eCommerce store, we get all possible results related to the item. We don’t have to bother about the exact spelling or product name to find the product. Another example is finding the right movies on Netflix.

The application is not limited to finding the right product. The advancement of AI is able to recommend the product based on your interest by analyzing your past transaction and taste of buying things. According to this data, AI is able to know what type of product is relevant to you and based on that it will filter the product and recommend it to you.

In this way, AI is playing a major role in marketing and increasing the online sale of products and hence e-commerce companies like Flipkart, Amazon, or Netflix like companies are leveraging the power of AI to sell their products with very ease and making a profit.


In the banking field, the AI system is growing faster. Many banks have already adopted the AI system to provide various services like customer support, anomalies detection, credit card frauds.

Let’s take an example of HDFC bank. They have developed an AI-based chatbot called Electronic Virtual Assistant(EVA). This chatbot has already addressed over 3 million customer queries. Eva can provide simple answers in less than 0.4 seconds. Bank of America has their chatbot name Erica. American Express uses their AmEX chatbots to benefit  their customers .

MasterCard and RBS WorldPay have used AI and deep learning to detect fraudulent transactions and prevent card fraud. This AI system saved millions of dollars. AI-Based fraud detection Algorithms are more accurate at detecting fraud with an accuracy of more than 95%. They have the capability to adapt quickly to detect new attempts at fraud in real-time.

The most important application of AI in banking is risk management because the estimates show that on an average loss of merchants due to fraud attacks is 1.5% of their annual revenue. JPMorgan also started to use  AI techniques to develop an “early warning” system that detects malware, Trojans, and Viruses. This detection system allegedly identifies suspicious behavior long before fraud emails are actually sent to employees.