Customers Segmentation Using RFM Model

Customer Lifetime Value

Customers Segmentation a phenomenon to determine Customer Lifetime Value (CLV), also known as Lifetime Value (LTV), is the present value of the future cash flows from the customer during his or her entire relationship with the company. In other words, the dollar value of a customer relationship which is based on the present value of the estimated future cash flows from that particular customer’s relationship with the company.

Why Customer Lifetime Value?

CLV is mainly used by companies for customer segmentation. Because each and every customer is not important for a company, hence CLV-based segmentation model allows the company to predict its most profitable group of customers, understand their nature, and focus more on them rather than wasting time, money and resource on other less profitable customers.

Introducing RFM-model

Recency-Frequency-Monetary (RFM) model is one of the predictive models to calculate CLV. By segmenting customers using RFM model, we can analyze each group of customers individually and determine which set of customers has the highest CLV, hence contributing to the profitability of the company.
RFM analysis determines quantitatively which customers are the best ones by examining the following factors :

  1. How recently a customer has purchased, also known as Recency,
  2. How often a customer purchases, also known as Frequency, and
  3. How much a customer spends during his/her purchase, also known as Monetary value.

It follows the axiom that 80% of business comes from 20% of customers (set that are most important).

RFM Model

To start with the RFM model, install the following packages first.

Now loading these above packages,

Now, importing the dataset in R,

The link to view the whole dataset “Online Retail” in CSV format is,
Workable Data: Download
To have a look at the preview of the dataset and see its structure,

It is an invoice dataset of online retail. It has 541909 customer records taking into account 8 variables.
We see there are few misinterpreted datatypes of the variables. Also, we see in the data that there are negative values for the variables, Quantity and Unit Price. This logically cannot happen. So we need to do treatment for all these.
Firstly, correcting the datatypes of required variables,

Also, to treat the illogically placed negative values in Quantity and Unit Price,

What we do is, we replace the negative values with NA, and them omitting the NA’s.
Checking the structure again,

Now the records decreased to 397884 observations, explained by the same 8 variables. And this data is a cleaned data, including no illogical records or datatypes in it.
Let us now create a calculated column showing total price, so that it is easy for us to calculate the monetary value of customers.

Now firstly, we calculate Recency of a customer in the number of days,

Here we take the difference in days between customer’s purchase and today, and finally rounding the figure. This shows the recency of the purchase done by the customer.
Next calculating Frequency of a customer,

To find how frequent a customer is, we create a dummy column ‘Count’ giving a fixed value of 1. So we group by customer ID and sum the number of ‘Count’(s) in each group. This will give us the total number of occurrences of purchase for a particular customer. Hence, keeping this as Frequency.
Next calculating Monetary Value of a customer,

To find the monetary value of a customer, we group by customer ID and sum the Total Price in each group. This will give us the total money spent on the purchase, for a particular customer. Hence, keeping this as Monetary Value.
Now viewing the preview of the data,

So we get 397884 records with its 13 variables. But we just need only three variables to do the process of segmentation with respect to each customer, Recency, Frequency and Monetary value. So separating these three variables and creating another dataset with these.

Extracting the Customer ID, Recency, Frequency and Monetary Value in data2.
Checking its structure now,
We find that it is in table form. Also, the variable datatypes are not as is desired to be. So do the necessary treatment on it,

Checking the structure again

So now we get a data frame structure consisting of 397884 records taking into consideration 4 variables only, that we require to make segmentation.

Clustering Model

To start with segmentation modelling, we have to remove the duplicate combinations of recency, frequency and monetary value, so that we get a unique set of records to do our segmentation. We keep these unique records in another new data set.

Now we have to create a new dataset with only the scaled-down values of Recency, Frequency and Monetary values so that there exists no bias in the unit of measurement. So three columns are present here storing it in another new dataset,

To convert the dataset into a data frame structure, for our ease in work, we convert it,

Checking the structure of the dataset now,

Now we get the required structure on which we can now segment our customers, using the algorithm of clustering.
Firstly, we install and run a few libraries to work on clustering,

To start with segmentation, we use the very famous K-means clustering. We divide our customers into three groups, based on their three features- recency, frequency and monetary value.

Running this code we get to see that the data is being divided into three groups, each size of 8704, 623 and 7436, respectively. Their individual cluster means are also calculated with respect to each feature. Also, a vector containing the cluster number for each record is provided. The accuracy comes about 64%, which is good.
To see the distance within points in a cluster and between two clusters are,

We see that between cluster distance is much more than within points distance in each cluster.
We can plot the clusters,


Now, we export out outputs in CSV files, to read it better,

My output file containing the customers segmented in three clusters is attached below. Click to the link to view the output,
Output file: Download
Customer ID in pink coloured cells are from Cluster 1, with blue cells are in Cluster 2 and in yellow cells are in Cluster 3.

Insights

For Cluster 1 – Recency is observed to be 2294 days to 2463 days. Variability of the customers is very very less.
The number of times the purchase is done ranges from 1 to 2700 times. Variability captured is very less.
The monetary value of the purchases made ranges from Rs.6.2 to Rs.91062. The variability observed is very less.

For Cluster 2 – Recency is observed to be 2294 days to 2667 days. Hence it takes into account a little older customers. Variability of the customers is very very less.
The number of times the purchase is done ranges from 3 to 7847 times. Hence it takes into account customers who purchase more number of times. Variability captured is moderately higher than the previous cluster.
The monetary value of the purchases made ranges from Rs.33720 to Rs.280206. Hence it takes into account customers who add to lot more valuation to their purchase. The variability observed is moderately higher than the previous cluster.

For Cluster 3 – Recency is observed to be 2458 days to 2667 days. Hence it takes into account a little older customers. Variability of the customers is very very less.

The number of times the purchase is done ranges from 1 to 2700 times. Hence it takes into account customers who purchase the moderate number of times. Variability captured is moderately higher than the previous cluster. Few outliers are observed in the higher end.

The monetary value of the purchases made ranges from Rs.3.75 to Rs.91062. Hence it takes into account customers who add to moderate valuation to their purchase. The variability observed is moderately higher than the previous cluster. High variability at the upper end. In fact, lots of outliers could be identified in higher purchase amounts.
So we can conclude that:

  1. The customers who fall under Cluster 2, have a high contribution to the profitability of the company, as they have the tendency of purchasing more and purchasing items with high amounts. These are the potential good customers.
  2. The customers who fall under Cluster 1, has a moderate contribution to the profitability of the company, as there is no high variability among the customer’s purchases. All customers purchase frequently, but the total amount is not much high. So if they are more motivated or advertised from the company’s end, then they could increase their number of purchase and be potential customers.
  3. The customers who fall under Cluster 3, are less attractive customers as they have high variability in their number of purchases, plus they contribute much of an amount. These customers can be least expected to contribute to profit.
  4. Apart from this, we can see that frequency mainly influence the monetary value contributed by the customers, hence determining their CLV. Recency is more or less same for all clusters and so cant is identified as an influential attribute to CLV.

You might also like More from author