Implementation of Statistical Hypothesis Testing in R
What is the statistical hypothesis?
A statistical hypothesis is an assertion or conjecture about the distribution of one or more random variables. If the hypothesis completely specifies the distribution, then it is called a simple hypothesis otherwise it is called the composite hypothesis.
What is testing?
Testing is a procedure or rule to decide whether to reject the hypothesis or not. We will discuss some tests (z-test, F-test, t-test, chi-square test) and their implementation into R.
Z-test:
Suppose we have n random X_{1}, X_{2},…, X_{n} samples from a normal distribution with mean μ and variance σ^{2}(which is specified/known). Here our hypothesis is about the mean μ of the normal population. We can have 3 types of test,
(i) H_{0}: μ = μ_{0} vs H_{1}: μ ≠ μ_{0}
(ii) H_{0}: μ = μ_{0} vs H_{1}: μ > μ_{0}
(iii) H_{0}: μ = μ_{0} vs H_{1}: μ < μ_{0}
(i) H_{0}: μ = μ_{0} vs H_{1}: μ ≠ μ_{0 }
In this case, we have both tailed test. The test statistic we use here is,
which follows under the null hypothesis N(0,1). We reject the null hypothesis at level α if the absolute value of the observed Z statistic is greater than the Z_{1-}_{α/2}, the upper α/2^{th} point of N(0,1).
Example:
Suppose we have 10 random samples from a normal population having variance 25.
3.27, 2.53, 2.98, 4.11, 3.35, 3.35, 0.38, 4.93, 3.97, 3.17
We want to test whether the mean is 3 or not. Here our hypothesis is H_{0}: μ = 5 vs H_{1}: μ ≠ 5. We have the R codes to perform the test as
CODE
1 2 3 4 5 6 7 8 |
x <- c(3.27,2.53,2.98,4.11,3.35,3.35,0.38,4.93,3.97,3.17) z = (sqrt(length(x))*(mean(x)-5))/5 if(abs(z) > qnorm(1-0.025,0,1)){ print("We reject the null hypothesis at 5% level of significance") }else{ print("We can't reject the null hypothesis at 5% level of significance") } |
Output
1 2 3 4 5 6 7 8 9 |
> x <- c(3.27,2.53,2.98,4.11,3.35,3.35,0.38,4.93,3.97,3.17) > z = (sqrt(length(x))*(mean(x)-5))/5 > if(abs(z) > qnorm(1-0.025,0,1)){ + print("We reject the null hypothesis at 5% level of significance") + }else{ + print("We can't reject the null hypothesis at 5% level of significance") + } [1] "We can't reject the null hypothesis at 5% level of significance" |
So, we accept (can’t reject) the statement that the mean of the population is 5 at 5% level of significance.
(ii) H_{0}: μ = μ_{0} vs H_{1}: μ > μ_{0}
[Note: The above hypothesis is equivalent to test, H_{0}: μ ≤ μ_{0} vs H_{1}: μ > μ_{0}] This is a right-tailed test where the test statistic is as same as above,
which follows under the null hypothesis N(0,1). We reject the null hypothesis at level α if the value of the observed Z statistic is greater than the Z_{1-}_{α}, the upper α^{th} point of N(0,1).
Example:
Consider the same example as before, we have 10 random samples from a normal population having variance 25.
3.27, 2.53, 2.98, 4.11, 3.35, 3.35, 0.38, 4.93, 3.97, 3.17
Our objective is to test whether the mean is greater than 5 or not. Here our hypothesis is H_{0}: μ = 5 vs H_{1}: μ > 5.
We have the R codes to perform the test as,
CODE:
1 2 3 4 5 6 7 8 |
x <- c(3.27,2.53,2.98,4.11,3.35,3.35,0.38,4.93,3.97,3.17) z = (sqrt(length(x))*(mean(x)-5))/5 if(z > qnorm(1-0.05,0,1)){ print("We reject the null hypothesis at 5% level of significance") }else{ print("We can't reject the null hypothesis at 5% level of significance") } |
OUTPUT:
1 2 3 4 5 6 7 8 9 |
> x <- c(3.27,2.53,2.98,4.11,3.35,3.35,0.38,4.93,3.97,3.17) > z = (sqrt(length(x))*(mean(x)-5))/5 > if(z > qnorm(1-0.05,0,1)){ + print("We reject the null hypothesis at 5% level of significance") + }else{ + print("We can't reject the null hypothesis at 5% level of significance") + } [1] "We can't reject the null hypothesis at 5% level of significance" |
So, we can’t agree that the mean of the population is greater than 5 at 5% level of significance.
(iii) H_{0}: μ = μ_{0} vs H_{1}: μ < μ_{0}
[Note: The above hypothesis is equivalent to test, H_{0}: μ ≥ μ_{0} vs H_{1}: μ < μ_{0}] This is a left tailed test where the test statistic is as same as above, i.e.
which follows under the null hypothesis N(0,1). We reject the null hypothesis at level α if the value of the observed Z statistic is smaller than the Z_{α}, the lower α^{th} point of N(0,1).
Example:
Consider the same example. We have 10 random samples from a normal population having variance 25.
3.27, 2.53, 2.98, 4.11, 3.35, 3.35, 0.38, 4.93, 3.97, 3.17
We want to test whether the mean is less than 5 or not. Here our hypothesis is H_{0}: μ = 5 vs H_{1}: μ < 5. We have the R codes to perform the test as,
CODE:
1 2 3 4 5 6 7 8 |
x <- c(3.27,2.53,2.98,4.11,3.35,3.35,0.38,4.93,3.97,3.17) z = (sqrt(length(x))*(mean(x)-5))/5 if(z < qnorm(0.05,0,1)){ print("We reject the null hypothesis at 5% level of significance") }else{ print("We can't reject the null hypothesis at 5% level of significance") } |
OUTPUT:
1 2 3 4 5 6 7 8 9 |
> x <- c(3.27,2.53,2.98,4.11,3.35,3.35,0.38,4.93,3.97,3.17) > z = (sqrt(length(x))*(mean(x)-5))/5 > if(z < qnorm(0.05,0,1)){ + print("We reject the null hypothesis at 5% level of significance") + }else{ + print("We can't reject the null hypothesis at 5% level of significance") + } [1] "We can't reject the null hypothesis at 5% level of significance" |
So, we can’t say the population mean is less than 5 at 5% level of significance.
These are the one sample z-test. We can have some situations where we have to check the difference between two normal populations with known variances. Let two populations are N(μ_{1},σ_{1}^{2}) and N(μ_{2},σ_{2}^{2}). We have random samples from these two populations as X_{1}, X_{2},…, X_{n1 }and Y_{1}, Y_{2},…, Y_{n2}. Here σ_{1}^{2}, σ_{2}^{2} are known quantity.
Our interest is to test, H_{0}: μ_{1 }– μ_{2} = μ_{0} vs H_{1}: μ_{1 }– μ_{2} ≠ μ_{0}, where μ_{0} is a real constant.
This is equivalent to test, H_{0}: μ_{1 }– μ_{2} – μ_{0} = 0 vs H_{1}: μ_{1 }– μ_{2} – μ_{0} ≠ 0
Here the test statistic is,
Which follows N(0,1) under the null hypothesis. We reject the null hypothesis at level α is the absolute value of the statistic is greater than Z_{1-α/2}, the upper α/2^{th} point of N(0,1).
Example:
Suppose we have 8 random samples from N(μ_{1},25) and 10 random samples from N(μ_{2},9) as follows,
X : 14.63, 5.44, 6.08, 7.40,15.41, -0.82, 7.89, 1.04Y : 8.26, 7.79, 7.98, 8.91,14.87, 6.04, 7.82, 4.52, 12.71, 10.51
We want to test whether the means of two populations are equal or not.
Here our hypothesis will be, H_{0}: μ_{1 }– μ_{2} = 0 vs H_{1}: μ_{1 }– μ_{2} ≠ 0
We have the R codes to perform the test as,
CODE:
1 2 3 4 5 6 7 8 9 |
x <- c(14.63, 5.44, 6.08, 7.40,15.41, -0.82, 7.89, 1.04) y <- c(8.26, 7.79, 7.98, 8.91,14.87, 6.04, 7.82, 4.52, 12.71, 10.51) z = (mean(x)-mean(y))/sqrt((25/8) + (9/10)) if(abs(z) > qnorm(1-0.025,0,1)){ print("We reject the null hypothesis at 5% level of significance") }else{ print("We can't reject the null hypothesis at 5% level of significance") } |
OUTPUT:
1 2 3 4 5 6 7 8 9 10 |
> x <- c(14.63, 5.44, 6.08, 7.40,15.41, -0.82, 7.89, 1.04) > y <- c(8.26, 7.79, 7.98, 8.91,14.87, 6.04, 7.82, 4.52, 12.71, 10.51) > z = (mean(x)-mean(y))/sqrt((25/8) + (9/10)) > if(abs(z) > qnorm(1-0.025,0,1)){ + print("We reject the null hypothesis at 5% level of significance") + }else{ + print("We can't reject the null hypothesis at 5% level of significance") + } [1] "We can't reject the null hypothesis at 5% level of significance" |
So, here the means do not differ significantly at 5% level of significance.
Similarly, we can test H_{0}: μ_{1 }– μ_{2} = 0 vs H_{1}: μ_{1 }– μ_{2} > 0 or H_{0}: μ_{1 }– μ_{2} = 0 vs H_{1}: μ_{1 }– μ_{2} < 0.
Here the statistic remains the same but the rejection region will be changed as discussed in one-sample z-test.
t-test:
We have discussed the tests for the mean of normal populations with known variance. But in most of the real-life situations, we have unknown variances. So, in this case, we have a t-test for testing of the mean of normal populations with unknown variances. Basically, here we just put the estimate of variance and carry out the test.
One sample t-test
The test statistic we use,
which follows t-distribution with n-1 degrees of freedom under the null hypothesis. Here s is sample variance with divisor n-1. Let us discuss this with an example. Suppose we have a sample of 30 observations on IQ scores of a class.
114, 104, 89, 118, 105, 90, 113, 90, 108, 116, 116, 106, 92, 105, 94, 100, 95, 97, 89, 97, 90, 124, 100, 98, 76, 106, 113, 86, 75, 102
Assuming a normal distribution, one may want to test if the mean IQ score is 95. Using R we can easily perform the hypothesis testing, H_{0}: μ = 95 vs H_{1}: μ ≠ 95.
CODE:
1 2 3 |
x <- c(114, 104, 89, 118, 105, 90, 113, 90, 108, 116, 116, 106, 92, 105, 94, 100, 95, 97, 89, 97, 90, 124, 100, 98, 76, 106, 113, 86, 75, 102) t.test(x , alternative = "two.sided", mu = 95) |
OUTPUT:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
> x <- c(114, 104, 89, 118, 105, 90, 113, 90, 108, 116, 116, 106, 92, 105, 94, 100, 95, 97, 89, 97, 90, 124, 100, 98, 76, 106, 113, 86, 75, 102) > t.test(x , alternative = "two.sided", mu = 95) One Sample t-test data: x t = 2.3868, df = 29, p-value = 0.02374 alternative hypothesis: true mean is not equal to 95 95 percent confidence interval: 95.75379 104.77954 sample estimates: mean of x 100.2667 |
Here the p-value is less than 0.05, so we can reject the null hypothesis at 5% level of significance. The IQ score on an average is not 95. Similarly, we can have the right/left tail test by changing the alternative as “greater”/”less”.
Two sample t-test
Now suppose we have two independent groups and we want to compare the means. Here we will use the two-sample t-test. The test statistic we use here is,
where, are the sample means and, are the sample variances of the two groups. The statistic follows t distribution with n_{1}+n_{2}-2 degrees of freedom under the null hypothesis. Suppose we have a score on some males and females as follows,
Female: 95, 78, 68, 95, 98, 79, 98, 86, 78, 89, 89, 94
Male: 100, 100, 95, 90, 95, 98, 100, 100
Here someone may test whether the means are the same or not. So, it is both tailed t-tests. Here the hypothesis is, H_{0}: μ_{1 }– μ_{2} = 0 vs H_{1}: μ_{1 }– μ_{2} ≠ 0 with unequal variances. We can perform the test using R and get the result as follows.
Female: 95, 78, 68, 95, 98, 79, 98, 86, 78, 89, 89, 94
Male: 100, 100, 95, 90, 95, 98, 100, 100
Here someone may test whether the means are the same or not. So, it is both tailed t-tests. Here the hypothesis is, H_{0}: μ_{1 }– μ_{2} = 0 vs H_{1}: μ_{1 }– μ_{2} ≠ 0 with unequal variances. We can perform the test using R and get the result as follows.
CODE:
1 2 3 4 |
x <- c(95,78,68,95,98,79,98,86,78,89,89,94) y <- c(100,100,95,90,95,98,100,100) t.test(x, y, alternative = "two.sided", var.equal = FALSE, mu = 0) |
OUTPUT:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
> x <- c(95,78,68,95,98,79,98,86,78,89,89,94) > y <- c(100,100,95,90,95,98,100,100) > t.test(x, y, alternative = "two.sided", var.equal = FALSE, mu = 0) Welch’s Two Sample t-test data: x and y t = -3.2698, df = 15.174, p-value = 0.005104 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -16.512162 -3.487838 sample estimates: mean of x mean of y 87.25 97.25 |
Here we have the p-value less than 0.05, so we reject the null hypothesis at 5% level of significance. We can conclude that the means significantly differ.
Similarly, we can have the comparison that the difference is c (some constant). Then we replace the value of “mu” by c. We can have a left/right-tailed test by changing the “alternative” as “less”/”greater”. If the two variables have common variance, then we put the logical value of “var.equal” as “TRUE”. [NOTE: When the variances are equal, are replaced by s^{2}, combined sample variance with divisor n_{1}+n_{2}-2]
Paired t-test
Here we also have two variables but they are related (i.e. collected from the same group, person, item or thing). Here we basically take the differences and taking the differences as a variable we perform one sample t-test.
The hypothesis we test here is, H_{0}: d = μ_{d }vs H_{1}: d ≠ μ_{d}.
which follows under null hypothesis t distribution with n-1 degrees of freedom. Let us consider an example of marks in mathematics and statistics of 10 students.
Mathematics: 97, 86, 100, 73, 79, 93, 80, 96, 82, 65
Statistics: 95, 77, 72, 74, 85, 80, 92, 78, 79, 75
We want to test if there is any significant difference between those marks. Using R:
CODE:
1 2 3 4 |
Maths <- c(97, 86, 100, 73, 79, 93, 80, 96, 82, 65) Stats <- c(95, 77, 72, 74, 85, 80, 92, 78, 79, 75) t.test(Maths, Stats, alternative = "two.sided", paired = TRUE, mu = 0) |
OUTPUT:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
> Maths <- c(97, 86, 100, 73, 79, 93, 80, 96, 82, 65) > Stats <- c(95, 77, 72, 74, 85, 80, 92, 78, 79, 75) > t.test(Maths, Stats, alternative = "two.sided", paired = TRUE, mu = 0) Paired t-test data: Maths and Stats t = 1.093, df = 9, p-value = 0.3028 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -4.706256 13.506256 sample estimates: mean of the differences 4.4 |
Here the p-value is greater than 0.05, so we accept (can’t reject) the null hypothesis at 5% level of significance. We can conclude that there is no significant difference between the marks in mathematics and statistics. We similarly have the left/right-tailed test by changing the “alternative” as “less”/”greater”. We can have the comparison that the difference is c (some constant). Then we replace the value of “mu” by c.
Chi-squared test:
First, let me tell you the basic of the chi-squared variable. If a X_{i} follows independently N(0,1), then X_{i}^{2} follows χ^{2} distribution with df 1. Chi-squared has additive property, so follows χ^{2} distribution with df n. Now, where the null distribution of test statistic follows χ^{2} distribution then the test is called chi-squared test. This is the basic idea of the chi-squared (χ^{2}) test.
Now, suppose we have n observations from N(μ_{0},σ^{2}), where μ_{0} is a known constant. We want to test for the variance. Here it will be a chi-squared test as the test statistic,
which follows distribution under null distribution. Now if the mean is unknown then the above test statistic will be
, where S^{2} is the sample variance with the n-1 divisor. Here the test statistic follows under the null hypothesis.
The main use of the chi-squared test is for checking independence between two categorical variables:-
Suppose we have a 2 X 2 contingency table showing new drug effects,
Improved | Not improved | |
Not treated | 26 | 29 |
Treated | 35 | 15 |
We want to check if the treatment really improves. Here our null hypothesis is “treatment and effects are independent” and the alternative is “treatment and effects are not independent”.
CODE:
1 2 3 4 5 |
x <- data.frame("improve" = c(26,35), "not improve" = c(29,15)) rownames(x) <- c("not treated", "treated") x chisq.test(x, correct = FALSE) |
OUTPUT:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
> x <- data.frame("improve" = c(26,35), "not improve" = c(29,15)) > rownames(x) <- c("not treated", "treated") > x improve not.improve not treated 26 29 treated 35 15 > chisq.test(x, correct = FALSE) Pearson's Chi-squared test data: x X-squared = 5.5569, df = 1, p-value = 0.01841 |
Here the p-value is less than 0.05, so we reject the null hypothesis at 5% level of significance. Thus, we conclude that there is a significant relationship between the treatment and the effect of the drug.
F-test:
F-test is used to test the equality of two variances of normal populations. We have a function “var.test()” for comparing two variances. Suppose we have the observations from two groups,
A: 9.83, 9.50, 5.49, 10.45, 7.76, 15.11, -5.30, -2.50, -1.29, 15.11
B: 29.82, -2.65, 14.78, 1.09, 9.86, 8.30, 8.93, 12.04, 31.89, 2.48, 8.38, -2.59
CODE:
1 2 3 4 |
A <- c(9.83, 9.50, 5.49, 10.45, 7.76, 15.11, -5.30, -2.50, -1.29, 15.11) B <- c(29.82, -2.65, 14.78, 1.09, 9.86, 8.30, 8.93, 12.04, 31.89, 2.48, 8.38, -2.59) var.test(A, B, ratio = 1, alternative = "two.sided") |
OUTPUT:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
> A <- c(9.83, 9.50, 5.49, 10.45, 7.76, 15.11, -5.30, -2.50, -1.29, 15.11) > B <- c(29.82, -2.65, 14.78, 1.09, 9.86, 8.30, 8.93, 12.04, 31.89, 2.48, 8.38, -2.59) > var.test(A, B, ratio = 1, alternative = "two.sided") F test to compare two variances data: A and B F = 0.42015, num df = 9, denom df = 11, p-value = 0.2034 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.117103 1.643673 sample estimates: ratio of variances 0.4201537 |
Here p-value of the test greater than 0.05, so we accept (can’t reject) the null hypothesis at 5% level of significance. Thus, we can say the variances do not significantly differ.
F-test is also used in the ANOVA technique. In ANOVA, we test the equality of effects of many treatments. In this case, the test statistic become,
We have a function in R names “aov()” whose “summary()” will provide the testing details with the p-value. We can decide to reject/accept the null hypothesis depending on the p-value.
Similarly, for testing the significance of regression we perform F-test as same as ANOVA.
These are how all the above tests work and it is quite easy to perform the tests in R as specified in this article. All the tests have the same assumption that the distribution of the variable has to follow a normal distribution. In real life, the number of observations we get in is too large to consider the normality assumption.