Information Value (IV) and Weight of Evidence (WOE)

In this article, we will learn more about the Information Value (IV) and Weight Of Evidence (WOE). The logistic regression model is one of the most commonly used statistical techniques for solving binary classification problem. It is acceptable in almost all the domains. The two concepts – the weight of evidence (WOE) and information value (IV) evolved from the same logistic regression technique.

Information value(IV) and Weight of Evidence(WOE) were developed for the credit and financial industries primarily to build better models to predict the risk of loan defaults (credit risk models ).  We know that there are many factors such as age, education, income the person, previous credit history, loan amount and etc which determine the probability (risk) of loan defaults as expressed in credit scores. With the help of the information value and WOE, we can calculate the predictive power of these predictor variables.

So, we can say  that WOE and IV play two distinct roles when analyzing data:

  • WOE describes the relationship between a predictive variable and a binary target variable.
  • IV measures the strength of that relationship.

They are also used in marketing analytics project such as customer attrition model, campaign response model etc. But here I have used a  credit risk dataset to explain the importance of both the concepts.

The Weight Of Evidence (WOE)

The Weight of Evidence measures the strength of a set of groups or bins and separates events from non-events. This is done by computing a simple ratio of:

(Distribution of Goods) / (Distribution of Bads)

Bad customers refer to the customers who were the loan defaulters and good customers refer to the customers who paid back the loans. Distribution of goods is the percentage of good customers in a group and the distribution of bads is the percentage of bad customers in a group.

If the Distribution bads > Distribution Goods, the odds ratio will be less than 1 and if the Distribution bads < Distribution Goods in a group, the odds ratio will be more than 1.

Now WOE is calculated by taking the natural logarithm of the ratio of percentage of non-events to the percentage of events.

So, the WOE will be a negative number if the odds ratio is less than 1 and it will be positive if the odds ratio is more than 1.

For a continuous variable, we create bins (categories/groups) for a continuous independent variable and then combine the categories with similar WOE values and replace categories with WOE values.

For categorical independent variables, we directly combine categories with similar WOE and then replace the categories with continuous WOE values. This is done because the categories with similar WOE have almost the same proportion of events and non-events. In other words, the behavior of both the categories is the same.

Now, in general, 10 or 20 bins are taken and each bin should have at least 5% of the observations and should be non-zero for both bad and good customers. The number of bins determines the amount of smoothing – the fewer bins, the more smoothing.  Moreover the fewer bins the capture of important patterns in the data is more while leaving out the noise. The WOE should be distinct for each category and it should be either increasing or decreasing. In case of missing WOE, we add 0.5 to the number of events and non-events in a group.

The Advantages of WOE

  • Handles missing values
  • Handles outliers
  • The transformation is based on the logarithmic value of distributions. This is well suited for Logistic Regression.
  • There is no need for dummy variables
  • By using the proper binning technique, it can establish a monotonic relationship (either increase or decrease) between the independent and dependent variable
  • IV value can be used to select variables quickly.

Information Value (IV)

The Information Value (IV) of a predictor is related to the sum of the values for WoE over all groups. Thus, it expresses the amount of information of a predictor variable for separating the Goods from the Bads. It ranks the variables on the basis of their importance or the amount of information it carries. Information value increases as bins/groups increases for an independent variable.

Moreover, Information value should not be used in the classification model other than logistic regression (for eg. random forest or SVM) as it is designed for binary logistic regression model only. It is one of the most useful techniques to select important variables in a predictive model.

The formula for information value is shown below.

Information Value (IV) and Weight of Evidence (WOE) in R

library (Information)

 The dataset is available on Kaggle

data <- read.csv (“creditcard.csv”)

We should make sure that all independent categorical variables are stored as a factor and the binary dependent variable has to be numeric before running IV and WOE.

summary (data)

This creates WOE tables and IVs for all variables in the input dataframe
IV <- create_infotables (data=data, y=”Class”, bins=10,parallel=FALSE)

We can extract the IV values of the variables

We save the IV values of each of the independent variables in a data frame.
IV_Value <- data.frame(IV$Summary)

We can also plot the WOE for various variables to see their trend. For example
plot_infotables (IV, “Amount”)

There are some other packages which can also be used to get the information values of the variables. Such as package informationValue

Details on Package Click

The functions used here are :

  1. WOE(X,Y)
  2. WOETable(X,Y)
  3. IV(X,Y)

Where X is the categorical variable for which the IV is computed .Y is the binary response variable which represents Good or Bad customers.

Advantages of Information Value (IV)

  • Considers each variable’s independent contribution to the outcome.
  • Detect linear and non-linear relationships.
  • Rank variables in terms of “univariate” predictive strength.
  • Visualize the correlations between the predictive variables and the binary outcome.
  • Seamlessly compare the strength of continuous and categorical variables without creating dummy variables.
  • Seamlessly handle missing values without imputation.
  • Assess the predictive power of missing values.

To learn more on Statistics for Data Science: Click

You might also like More from author