# Actuarial Science: Graduation and Statistical Test

In this article, we will talk about **Graduation and Statistical Test.** Every business is done with the motive to earn a profit and accordingly, the price of products or services are charged. Likewise, for insurance companies premium is the main inflow to run a business. So, companies try to calculate the premium that will give the desired level of profitability.

The premium calculation requires different values which are mortality rate, interest rate etc and all these values are available in our actuarial table. But have you ever thought how these values are estimated before printing it in the actuarial table?

Th answer to the above question is, lot of data are collected and processed to calculate the values (mortality rates, probabilities etc.). The estimated value should be justifiable. If not, the values are processed further and made suitable for the use. Once the processing is done and best estimated value is calculated, now values can be printed in the actuarial table and used for calculation purpose.

Now, Let’s take an example to see whether the estimated value is a good estimate or not. If not, then what technique should be used to get a good estimated value. In the below example we have an estimated mortality rate and we want to see whether these rates are justifiable or not. we will do this by plotting a scatterplot: –

Let’s see the R code to plot a scatter plot:

*Store the values in a variable and then create a dataframe*

age<-c(30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49)

force_of_mortality<- c(0.000557,.000645,.000497,.000474,.000372,.000757, .000701, .000618, .000738,. 000709,.000921, .000964,.001285,.001367, .001804,.001942,.001963,.002537,.002471,.003011)

Graduation_A<-data.frame(age,force_of_mortality)

*Now plot a scatterplot to see the mortality rate movement*

install.packages(“ggplot2”)

library(ggplot2)

## Warning: package ‘ggplot2’ was built under R version 3.5.2 ggplot(Graduation_A,aes(y=force_of_mortality,x=age))+geom_point() +theme_classic()

The above trend of mortality rate is not quite justifiable. Mortality rate keeps on decreasing after age 35 till age 37 and then mortality rate rises and drops again. We all know mortality rate increases gradually with age. But here rates are not increasing gradually. We need to process these rates further and make them suitable for our purpose.

The technique used to do so is called **Graduation.** Graduation is a technique used to get a smooth and justifiable mortality rate (which is also called as Graduated Mortality Rate).

There are three methods, that are used to carry out graduation: –

- Graduation by parametric formula.
- Graduation by reference to a standard table.
- Graduation using spline function.

The aims of graduation are: –

- To produce a smooth set of rates that are suitable for a particular purpose.
- To remove random sampling error.
- To use information available from adjacent ages.

**Why do we need smoothness? **

Let’s take an example to understand the importance of smoothness. Suppose we are calculating premium according to our estimated mortality rate. Then we can notice that

This does not sound good for the business and this drop-in premium with an increase in age is not justifiable. So, using graduation we calculate a smooth rate that will be justifiable and appropriate to use. For smoothing, we can make use of the data at adjacent ages to improve the sampling error at each age. For example, if the force of mortality is smooth and not changing too rapidly, then our estimate of µ* _{X} *should not be too far away from estimating µ

_{X-1 }and µ

_{X+1}, as well as being the ‘best’ estimate, in some sense, of µ

*.*

_{X}Three desirable features of a graduation are: –

- Smoothness
- Adherence to data
- Suitability for the purpose to hand

We
need to make a balance between smoothing and adherence to data.** **At one extreme, we could easily smooth the crude
estimates by ignoring the data altogether; we want to avoid such extremes since
we want the graduation to be representative of the experience.

If the graduation process results in rates that are smooth but show little adherence to the data, then we say that the rates are ** over-graduated**. If insufficient smoothing has been carried out then it is called

**under-graduation**.

Now let’s begin with the practical part.

We will be using the same example that we used above to plot a scatterplot.

We will be performing a chi-square test to check the suitability of our estimate. Chi-Squared test is used to assess whether the observed numbers of individuals who fall into specified categories are consistent with a model that predicts the expected numbers in each category. It is a test for the overall *goodness of fit*. First, we will calculate the Standardised Deviation and apply the chi-square test on it. If the Chi-Square test follows, we will check for the defects which the Chi-Square test fails to detect. The defects are:-

- There could be a few large deviations offset by a lot of very small deviations. In other words, the Chi-Square test could be satisfied although the data do not satisfy the distributional assumptions that underlie it. This is, in essence, because of the Chi-Square statistic summarises a lot of information in a single figure. For this, we will perform
**the ISD (Individual Standardised Deviation) test**. - The graduation might be biased above or below the data by a small amount. The Chi-square statistic can often fail to detect consistent bias if it is small. For this, we will perform
**the Sign Test**. - Even if the graduation is not biased as a whole, there could be significant groups of consecutive ages (called
*runs*or*clumps*) over which it is biased up or down. For this, we will be performed**Serial Correlation Test**.

Now let’s understand, what is the standardized deviation and how it is calculated?

Z_{X} is our Standardised Deviation. The theoretical result between Z_{X} and Chi-Square are: –

Hypothetical test used during Graduation will be: – **H**_{0}**:** There are no significant differences between the two sets of rates.**H**_{1}**:** There is significant difference between two rates.

Now, let’s calculate Standardised Deviation (Z_{X}) and Z_{X}^{2}. R code for that is: –

*First, add all the remaining entries in Graduation_A’s dataframe before calculating Standardised deviation*

Central_expose_risk<-c(70000,66672,68375,65420,61779,66091,68514, 69560, 65000, 66279,67300,65368,65391, 62917,66537,62302,62145, 63856, 61097, 61110)

death<-c(39,43,34,31,23,50,48,43,48,47,62,63,84,86,120,121,122,162,151,184)

graduated_force_of_mortality<- c(.000388,.000429,.000474,.000524,.000579, .000640,.000708,.000782,.000865,.000956,.001056,.001168,.001291,.001427, .001577,.001743,.001962,.002129,.002353,.002601)

Graduation_A<-data.frame(age,Central_expose_risk,death, force_of_mortality,graduated_force_of_mortality)

*Now calculate Standardised Deviation (Z _{X}^{2}) *

product<-round(Graduation_A$Central_expose_risk*Graduation_A$graduated_force_of_mortality,2)

Graduation_A<-data.frame(Graduation_A,product)

standard_deviation<-round((death-product)/(sqrt(product)),2)

Graduation_A<-data.frame(Graduation_A,standard_deviation)

square_standardised_deviation<-round(standard_deviation**2,2)

Graduation_A<-data.frame(Graduation_A,square_standardised_deviation)

View(Graduation_A)

Final step is to performing Chi-Square Test…

*R code to calculate the p-value of chi-square test is: –*

1-pchisq(sum(square_standardised_deviation),df=18)

## [1] 0.0007645671

Here we took degrees of freedom=18, because it was stated in the question that 2 parameters are estimated. So, for 2 estimation we will reduce 2 degrees of freedom.

**p-value**
came out to be less than 0.05. So, we have sufficient evidence to reject the
null hypotheses. Since chi-square test does not follow, we will not check for
the defects.

Now, Let’s take another example: –

** Perform similar steps that we have performed above**.

age<-c(30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49)

Central_expose_risk<-c(70000,66672, 68375, 65420, 61779, 66091, 68514, 69560, 65000, 66279, 67300, 65368, 65391, 62917, 66537, 62302, 62145, 63856, 61097,61110)

death<-c(39,43,34,31,23,50,48,43,48,47,62,63,84,86,120,121,122,162,151,184)

force_of_mortality <- c(0.000557,.000645,.000497,.000474,.000372, .000757, .000701, .000618, .000738, .000709, .000921, .000964, .001285, .001367, .001804, .001942, .001963, .002537, .002471, .003011)

graduated_force_of_mortality <-c(0.000555, .000658, .000488, .000432, .000486, .000596, .000685, .000713, .000709, .000733, .000831, .001015, .001259, .001494, .001679, .001866, .002134, .002423, .002498, .003008)

Graduation_B <- data.frame(age,Central_expose_risk, death, force_of_mortality, graduated_force_of_mortality)

*All the values have been entered. Now calculate the required values*

product <-round(Graduation_B$Central_expose_risk*Graduation_B$graduated_force_of_mortality, 2)

Graduation_B<-data.frame(Graduation_B, product)

standard_deviation<-round((death-product)/(sqrt(product)),2)

Graduation_B<-data.frame(Graduation_B, standard_deviation)

square_standardised_deviation <- round(standard_deviation**2, 2)

Graduation_B <-data.frame(Graduation_B, square_standardised_deviation)

View(Graduation_B)

*Performing Chi-Square Test*

1-pchisq(sum(square_standardised_deviation), df=10)

## [1] 0.4936845

p-value comes out to be greater than 0.05. So, we have insufficient evidence to reject null hypotheses.

Now, we will check for the defects**the ISD Test**. For, ISD test we will check whether Z_{X} values are normally distributed or not. Hypotheses for the test is: – H_{0}: – Z_{X}~N (0,1)

shapiro.test(square_standardised_deviation)

##

## Shapiro-Wilk normality test

##

## data: square_standardised_deviation

## W = 0.69161, p-value = 3.086e-05

As p-value is greater than .05. We have insufficient evidence to reject the null hypothesis and we can conclude that the distribution of the data is not significantly different from a normal distribution.**Sign Test** to check for overall biasedness. Hypotheses for the test is: –

H0: – P~B(m,0.5)

“P” denotes number of positive deviation.

a=length(Graduation_B$standard_deviation)

i=1

c=0

while (i<=a) {

if(Graduation_B$standard_deviation[i]>0){

c=c+1

}

i=i+1

}

print(c)

## [1] 12 binom.test(x=c, n=20, p=.5, alternative = “two.sided”)

**Output**

##

## Exact binomial test

##

## data: c and 20

## number of successes = 12, number of trials = 20, p-value = 0.5034

## alternative hypothesis: true probability of success is not equal to 0.5

## 95 percent confidence interval:

## 0.3605426 0.8088099

## sample estimates:

## probability of success

## 0.6

As p-value is greater than 0.05. So, we have insufficient evidence to reject null hypothesis. We can conclude that data is not overall biased.

**Serial Correlation test**

cor.test(y=standard_deviation,x=age) ##

## Pearson’s product-moment correlation

##

## data: age and standard_deviation

## t = -0.089788, df = 18, p-value = 0.9294

## alternative hypothesis: true correlation is not equal to 0

## 95 percent confidence interval:

## -0.4593781 0.4253448

## sample estimates:

## cor

## -0.02115851

The p-value is greater than .05, we have insufficient evidence to reject Null Hypotheses. We can conclude that data is not in clusters.

I hope you have learned something new through this article. That’s all from my side. In case of any doubt please comment below. We will get back to you soon with your query. Thank You