While shopping online, have you noticed the site recommending products similar to the one you’re looking for? Or while watching YouTube videos, recommendations based on the channels you visit? Or Facebook recommendations to like certain pages based on the pages you have liked? What is the mechanism to understand the pattern and develop meaningful insights? Yes, It is Machine Learning.
Machine Learning is the science of getting computers to learn and act like humans do, and improve their learning over time in an autonomous fashion, by feeding them data and information in the form of observations. Machine learning is used in many domains, right from predicting if the next movie will be a success at the box office or not to the nuances of the stock market such as predicting the stock price.
Coming to the Actuarial profession, Machine learning has applications in pricing, reserving, product design, capital modeling to name a few. It has been introduced in CS2 as well, thus covering applications of concepts such as time series, Lee-Carter, pspline regression models using R.
Machine learning can be broadly classified into Supervised, Unsupervised and Reinforcement learning. Curriculum 2019 mainly focuses on Supervised and Unsupervised Machine learning.
Let’s have a look at what it is:
Supervised Machine Learning
Supervised learning, as the name suggests, indicates a presence of a supervisor as teacher. It is a learning in which we teach or train the algorithm using data which is well labelled, that means some data is already tagged with the correct answer (training data). This algorithm consist of a target / outcome variable (or dependent variable) which is to be predicted from a given set of predictors (independent variables). Using these set of variables, we generate a function that map inputs to desired outputs. After that, the algorithm is provided with a new set of data so that the supervised learning algorithm analyses and produces a outcome from the labelled data.
Supervised learning has been broadly classified into regression and classification problems. Both problems have the goal of construction of a good model that can predict the value of the dependent variable from the independent variables. The difference between the two tasks is the fact that the dependent variable is numerical for regression and categorical for classification.
- Regression: A regression problem is when the output variable is a real or continuous value, such as salary or weight. Regression predictive modeling is the task of approximating a mapping function (f) from input variables (X) to a continuous output variable (Y). For example, predicting the annual expenditure (dependent variable) of a person by using his annual income as the independent variable.
- Classification: A classification problem is when the output variable is a category, such as yes or no, black or white. A classification model attempts to draw some conclusions from observed values. Given one or more inputs a classification model will try to predict the value of one or more outcomes. Classification models include Logistic regression, Decision tree, Random forest, Naive Bayes, to name a few. For example, predicting whether a person will default on his next loan payment on the basis of his income.
Many actuarial modeling projects such as insurance contract pricing, pension scheme valuation fall into the category of supervised learning.
Unsupervised Machine Learning
Unlike supervised learning, no teacher is provided which means no training will be given to the machine. The information is neither classified nor labeled and the task of the machine is to group unsorted information according to similarities, patterns, and differences without any prior training of data.
In this algorithm, we do not have any target or outcome variable to predict. It is used for clustering population in different groups, which is widely used for segmenting variables under study in different groups. Unsupervised machine learning can be classified into two categories of algorithms:
- Clustering: It is the task of grouping a set of objects in such a way that objects in the same group (a cluster) are more similar to each other than to those in other groups (clusters). A clustering problem is when you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior. Some examples are:
- given class of buyers, cluster based on the buyer attributes
- given a set of tweets, cluster based on the content of the tweet
- Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y. For example, the rule found in the sales data of a supermarket would indicate that if a customer buys bread and butter together, they are likely to buy tomato ketchup. Such information can be used as the basis for decisions about marketing activities such as promotional pricing or product placements.
Though Machine Learning is a vast topic, identifying the correct technique according to your problem and the data and going ahead in a systematic way is the key.
By now you would have got a brief idea of what Machine Learning is. Here is a list of few resources which can help you in understanding the concept in detail from an actuarial perspective.