Testing Of Hypothesis: Parametric Tests

Hypothesis testing is testing an assumption about the population based on the sample data provided. Hypothesis testing is broadly classified into Parametric and Nonparametric hypothesis testing.  A parametric hypothesis test makes assumptions about the underlying distribution of the population from which the sample is being drawn and which is being investigated. In this article, we’ll study how to apply the different parametric hypothesis tests covered in CS1: Actuarial Statistics using R.

Assumptions for t-tests:

Note: In this article, we assume that the normality and no outliers assumption holds true where required and do not test it separately.

You can refer to the R Code and data files used in this article here:

Testing the value of a population mean

Q. Check if mean birth weight of newborn babies is 3 kgs.

A. We have data of birth weight of 25 newborn babies. To check if the mean birth weight of newborn babies is 3kgs, we use one sample t-test. As nothing is mentioned about the level of significance, we consider the default value of 5%.

To test,

H0: Mean birth weight of newborn babies is 3kgs 
v/s
H
1: Mean birth weight of newborn babies is not equal to 3kgs

As p-value (0.3195) > 0.05, do not reject H0 and conclude that the mean birth weight of newborn babies is 3 kgs.

Alternatively, this question can be solved by calculating the value of the statistic using the formula step by step. We can make a function to give the value of the statistic for any dataset.

First, let us understand how a function works in R.

In maths, if we write a function, y = f(x) = x^2, then it’ll give different values of y for different values of x. If x = 3, then y will be 9.

Similarly, in R, a function of one or more arguments (in this case x is an argument) performs the given statements (in this case y = x^2) to give the required output for the desired value (in this case the output is 9, for the given input of 3). The above-discussed function can be named as try1.

Testing the value of the difference between two population means

Q. Test if there is a difference in time taken (in minutes) by 2 teams in writing an article.

A. We have data of 14 writers per team. We’ll use independent sample t-test for this problem.

First, we’ll check equality of variances assumption for 2 groups.

To test,

H0: Variances between the two groups are equal
v/s
H
1: Variances between the two groups are not equal

As p-value (0.5799) > 0.05, do not reject H0. We conclude that the variances are equal, hence perform the t-test. 

To test,

H0: No significant difference in mean time taken by 2 teams in writing an article
v/s
H
1: Significant difference in mean time taken by 2 teams in writing an article

As p-value (0.6447) > 0.05, do not Reject H0 and conclude that there is no significant difference in time taken by 2 teams to complete the article.

Alternatively, this question can be solved by calculating the value of the statistic using the formula step by step. We can make a function to give the value of the statistic for any dataset.

Paired data : t-test

Q. Test if there is a reduction in weight post diet plan implementation.

A. As the data corresponds to weight values pre and post diet plan implementation, it is a case of paired t-test.

To test,

H0: Mean difference of weights pre and post implementation of diet plan is 0
v/s
H
1: Mean difference of weights pre and post implementation of diet plan is greater than 0

As p-value (2.688e-07) < 0.05, reject H0 and conclude that there is a reduction in weight post diet plan implementation.

Alternatively, this question can be solved by calculating the value of the statistic using the formula step by step. We can make a function to give the value of the statistic for any dataset.

Testing the value of a population variance

Q. Determine whether the standard deviation of the heights of 12 year old children is equal to 4cm, based on a random sample of 5 heights in cm.

A. This question can be solved by calculating the value of the statistic using the formula step by step. We can make a function to give the value of the statistic for any dataset.

To test,

H0: Standard deviation of the heights of 12-year-old children is equal to 4cm
v/s
H
1: standard deviation of the heights of 12 year old children is not equal to 4cm

As p-value (0.3694) > 0.05, do not reject H0 and conclude that the standard deviation of the heights of 12-year-old children is 4cm.

Testing the value of a population proportion

Q. In a one year mortality investigation, 4 of the 25 ninety year olds present at the start of the investigation died before the end of the year. Assuming that the number of deaths has a binomial (25,p) distribution, test whether this result is consistent with a mortality rate of p = 0.2 for this age.

A. To test,

H0: p = 0.2
v/s
H
1: p ≠ 0.2

As p-value (0.804) > 0.05, do not reject H0 and conclude that the true mortality rate for this age is 0.2.

Alternatively, this question can be solved by calculating the value of the statistic using the formula step by step. We can make a function to give the value of the statistic for any dataset.

Testing the value of the difference between two population proportions

Q. In a one-year mortality investigation, 25 of the 100 ninety-year-old males and 20 of the150 ninety year old females present at the start of the investigation died before the end of the year. Assuming that the numbers of deaths follow binomial distributions, test whether there is a difference between male and female mortality rates at this age.

A. This question can be solved by calculating the value of the statistic using the formula step by step. We can make a function to give the value of the statistic for any dataset.

To test,

H0: Male and Female mortality rates are the same
v/s
H
1: Male and Female mortality rates are different

As the p-value (0.01866) < 0.05, reject H0 and conclude that male and female mortality rates are different.

Alternatively, this question can be solved by calculating the value of the statistic using the formula step by step. We can make a function to give the value of the statistic for any dataset.

Testing the value of the mean of a Poisson distribution

Q. In a one year investigation of claim frequencies for a particular category of motorists,the total number of claims made under 5,000 policies was 800. Assuming that the number of claims made by individual motorists has a Poisson (λ ) distribution, test whether the average claim frequency λ is less than 0.175.

A. To test,

H0: Average claim frequency λ is equal to 0.175
v/s
H
1: Average claim frequency λ is less than 0.175

As p-value (0.005388) < 0.05, reject H0 and conclude that the true claim frequency is less than 0.175.

Alternatively, this question can be solved by calculating the value of the statistic using the formula step by step. We can make a function to give the value of the statistic for any dataset.

Testing the value of the difference between two Poisson means

Q. In a one year investigation of claim frequencies for a particular category of motorists, there were 150 claims from the 500 policyholders aged under 25 and 650 claims from the 4,500 remaining policyholders. Assuming that the number of claims made by individual motorists in each category has a Poisson distribution, test whether the claim frequency is the same for drivers under age 25 and over age 25.

A. To test,

H0: Claim frequency is the same for drivers under age 25 and over age 25
v/s
H
1: Claim frequency is not the same for drivers under age 25 and over age 25

It is evident that the statistic value is too high on comparing with the normal distribution tables. Therefore, conclude that the claim frequencies are different for younger and older drivers.

Download data and R codes used in this article.

Read more on Statistics for Data Science.

You might also like More from author