# T-Test Distribution and Its Application

In order to understand T-Test Distribution, Consider the situation, you want to compare the performance of two workers of your company by checking the average sales done by each of them, or to compare the performance of a worker by comparing the average sales done by him with the standard value. In such situations of daily life, t distribution is applicable.

T test is a testing procedure which deals with the “*comparison” *of the statistics (under the set comparison in the above venn diagram).

T distribution is a member of continuous probability distributions that arise when estimating the mean of the normally distributed population in situations where the sample size is small (less than 30) and population standard deviation is unknown. It was developed by William Sealy Gosset under the pseudonym *Student*.

### Important Properties

**Property 1:** The total area under a *t* distribution curve is 1.0: that is 100%.

**Property 2:** A *t*-curve is symmetric around 0.

**Property 3**: As the degree of freedom increases, t-distribution curve looks more and more like a standard normal curve.

**Property 4**: The *t*-distribution is symmetric and bell-shaped, like the normal distribution, but has heavier tails, meaning that it is more prone to producing values that fall far from its mean.

**Example: **Suppose your friend is the CEO of light bulbs manufacturing company and he claims that an average light bulb of his company lasts 300 days. Now you are the researcher, you randomly selected 15 bulbs for testing. The sampled bulbs last an average of 290 days, with a standard deviation of 50 days. Check whether your friend’s claim is true or not.

**Solution: **The traditional approach requires you to compute the t statistic, based on data presented in the problem description.

The first thing we need to do is compute the t statistic, based on the following equation: X͞ =290, n=15, S=50, μ=300

Where X͞ is the sample mean, μ is the population means, *S* is the standard deviation of the sample, and *n* is the sample size.

The null hypothesis is

H_{0}: μ=300,i.e. the CEO’s claim is true or the average light bulbs last in 300 days.

The alternative hypothesis is

H_{1}: μ≠300,i.e the CEO’s claim is not true or the average light bulbs do not last in 300 days.

Using the formula

t_{cal} = (290-300)/(50/√15)

= 0.7745966

t_{tab=} 2.144787 at 5% confidence interval

- The degrees of freedom are equal to 15 – 1 = 14.
- t
_{cal }< t_{tab }at 5% confidence interval,H_{0}may not be rejected , or CEO’s claim may be true.

The cumulative probability: 0.226. Hence, if the true bulb life were 300 days, there is a 22.6% chance that the average bulb life for 15 randomly selected bulbs would be less than or equal to 290 days.

**Applications of T-Test Distribution**

**The following are the important applications of the t-distribution:**

**Test of Hypothesis of the Population Mean:** In the example given above, as we have the hypothesis that whether the average life of a bulb is actually equal to some said value (300). This is the situation of “*test of the hypothesis of population mean”.*

*Condition: *When the population is normally distributed, and the standard deviation ‘σ’ is unknown, then “t” statistic is calculated as:

X͞ = Sample Mean

μ = Population Mean

n = Sample size

S = Standard deviation of the sample calculated by applying the following formula:

The null hypothesis is tested to check whether there is a significant difference between the X͞ and µ. If the calculated value of ‘t’ exceeds the table value of ‘t’ at a specific significance level, then the null hypothesis is rejected considering the difference between the X͞ and µ as significant. On the other hand, if the calculated value of ‘t’ is less than the table value of ‘t’, then the null hypothesis is accepted. It is to be noted that this test is based on the degrees of freedom, i.e. n-1.

**R-Code:**

1 2 3 4 |
# Code x<-rnorm(13,mean=2,sd=3) t.test(x,mu=2) |

**Output**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Output: One Sample t-test data: x t = 1.8881, df = 12, p-value = 0.08343 alternative hypothesis: true mean is not equal to 2 95 percent confidence interval: 1.696752 6.241702 sample estimates: mean of x 3.969227 |

1 2 3 |
# Code t.test(x,mu=2,conf.level=0.9) |

**Output**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Output: One Sample t-test data: x t = 1.8881, df = 12, p-value = 0.08343 alternative hypothesis: true mean is not equal to 2 90 percent confidence interval: 2.110323 5.828131 sample estimates: mean of x 3.969227 |

**
Test of Hypothesis of the Difference between Two Means:** Suppose in the above situation, there are two bulb manufacturing companies and their respective CEOs claim their product to be superior to the other. In such a situation, we can simply compare the average lifetime of light bulbs of the two companies and test their claims. This is a case of “test of the hypothesis of the difference between two means”.

*Condition:* In testing hypothesis about the difference between two means drawn from the two systematic population whose variance is unknown, then t-test can be calculated in two ways:

**Variances are equal: **When the population variances, though unknown are taken as equal, then the t- statistic to be used is:

Where, X͞_{1} and X͞_{2} are the sample means of sample 1 of size n1 and sample 2 of size n_{2}. S is the common standard deviation obtained by pooling the data from both the samples and can be calculated by applying the following formula:

**S ^{2}=(S^{2}_{1}+S^{2}_{2}) / (n_{1}+n_{2}-2)**

The null hypothesis is that there is no difference between two means and’ is accepted when the calculated value of ‘t’ at a specified significance level is less than the table value of ‘t’ and is rejected when the calculated value exceeds the table value.

**R Code:**

1 2 3 4 5 6 7 |
# Code nyspending<-rnorm(50,mean=250,sd=75) parisspending<-rnorm(50,mean=300,sd=80) t.test(nyspending,parisspending,var.equal=TRUE) |

**Output:**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Output: Two Sample t-test data: nyspending and parisspending t = -5.6938, df = 98, p-value = 1.299e-07 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -108.60149 -52.46465 sample estimates: mean of x mean of y 233.4548 313.9878 |

**Variances are Unequal:**** **When the population variances are not equal, then we use the unbiased estimators S_{1}^{2} and S_{2}^{2}. In this case, the sampling has the huge variability than the population variability and statistic to be used is:

Where µ_{1} and µ_{2 }are the two population means.

**R Code:**

1 2 |
t.test(nyspending,parisspending,var.equal=FALSE) |

**Output:**

1 2 3 4 5 6 7 8 9 10 11 12 |
Welch Two Sample t-test data: nyspending and parisspending t = -5.6938, df = 97.984, p-value = 1.299e-07 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -108.60155 -52.46459 sample estimates: mean of x mean of y 233.4548 313.9878 |

By default, R assumes that the variances are unequal, thus defaulting to Welch’s test.

**Test of Hypothesis of the Difference Between Two Means with Dependent Samples:** Suppose you want to check the gain in the weights of pigs fed on two diets, then in such situations since the samples are the same (the pigs to which two diets are given), this is the case of “**paired t-test**”.

** Situation:** It is possible that the samples are drawn from the two populations that are dependent on each other. Thus, the samples are said to be dependent, as each observation included in sample one is associated with the particular observation in the second sample. Hence, due to this property, the t-test that will be used here is called the paired t-test.

This test is applied in the situations when before and after experiments are to be compared. Usually, two methods are adopted that are related to each other. The following statistic is used when the means of both the methods applied is equal This statistic follows t- distribution with (n-1) degrees of freedom, where the d͞ = mean of the differences calculated as S is the standard deviation of differences and is calculated by applying the following formula:

n = Number of paired observations.

**R Code:**

Suppose we are testing the blood pressure before and after the treatment.

1 2 3 4 5 6 |
pretest<-c(rnorm(1000,mean=145,sd=9)) posttest<-c(rnorm(1000,mean=138,sd=8)) t.test(pretest,posttest,paired=TRUE) |

**Output:**

1 2 3 4 5 6 7 8 9 10 11 12 13 |
Paired t-test data: pretest and posttest t = 17.807, df = 999, p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 6.038627 7.534425 sample estimates: mean of the differences 6.786526 |

Thus we can say that the t-test is widely useful and practically applicable if the conditions like a small sample, unknown standard deviations are fulfilled. It is a useful test for testing of two means with a certain degree of freedoms.