A two-sample t-test is a statistical test used to compare the means of two different samples to determine if there is a significant difference between them. It is based on the assumption that the samples are drawn from populations with normal distributions. Unlike the two-sample z-test, which requires that the population standard deviations be known or that the sample sizes be large (30 or more), the two-sample t-test does not have this requirement and can be used with smaller sample sizes.
Dependent vs Independent Samples
There are two types of two-sample t-tests: dependent and independent.
In an independent two-sample t-test (also known as an unpaired t-test), the samples in the two groups being compared are unrelated. The samples are drawn from two different populations or groups of subjects, and the difference between the means of the two groups is calculated using the means and variances of the two separate samples. This post covers the independent two-sample t-test.
In a dependent two-sample t-test (also known as a paired t-test), the samples in the two groups being compared are related in some way. For example, the samples may be pairs of measurements taken on the same subjects or on subjects who are closely matched in some other way. In this case, the difference between the means of the two groups is calculated by taking the differences between the pairs of measurements and treating these differences as a single sample. A separate post covers the paired t-test or dependent two-sample t-test.
The choice between a dependent or independent t-test depends on the nature of your data and the research question you are trying to answer.
Steps in Two Sample T Test
To conduct a two-sample t-test, the following steps are typically followed:
- Specify the null and alternative hypotheses. The null hypothesis is usually that the means of the two samples are equal, while the alternative hypothesis is that the means are unequal.
- Collect and summarize the data for both samples. Calculate the sample means and standard deviations for each sample.
- Calculate the test statistic, which is the difference between the two sample means, divided by the standard error of the mean.
- Determine the critical value of the test statistic based on the significance level (alpha) of the test and the degrees of freedom. The degrees of freedom are calculated as the sum of the sample sizes minus 2 or (n1 + n2 - 2) when the variances of two populations are assumed to be the same. If the variances of two populations are unequal, there is a complex formula for calculating the degrees of freedom, which is shown below in this post.
- Compare the calculated test statistic to the critical value to determine whether to reject or fail to reject the null hypothesis. If the calculated test statistic exceeds the critical value, the null hypothesis is rejected, and the alternative hypothesis is accepted.
Conditions for Two Sample T Test
To conduct a valid two-sample t-test, the following conditions must be met:
- Both samples must be drawn randomly from the populations.
- Each observation in the sample must be independent of the others.
- The sampling must be done with replacement, OR
- If sampling without replacement, the sample size must be less than 10% of the population.
- The population distribution must approximate a normal distribution.
- The population standard deviations are unknown, and the sample sizes are small (less than 30).
Typical Null and Alternate Hypotheses in Two-Sample T-Test
a) Two-Tail Test:
In a two-sample t-test, the null hypothesis is that there is no difference between the means of the two samples. This can be expressed as:
H0: μ1 = μ2
where μ1 is the mean of the first sample and μ2 is the mean of the second sample.
The alternate hypothesis is the opposite of the null hypothesis and is that there is a difference between the means of the two samples. This can be expressed as:
Ha: μ1 ≠ μ2
b) Left-Tail Test:
A left-tailed hypothesis is one in which the mean of the first sample is less than the mean of the second sample. This can be expressed as:
H0: μ1 >= μ2
Ha: μ1 < μ2
c) Right-Tail Test:
A right-tailed hypothesis is one in which the mean of the first sample is greater than the mean of the second sample. This can be expressed as:
H0: μ1 <= μ2
Ha: μ1 > μ2
Calculating Test Statistic
The t-score represents the number of standard errors that the difference between the two sample means is from zero. It determines whether the difference between the two sample means is statistically significant.
There are two methods for calculating the t-value in a two-sample t-test: one for equal variances and one for unequal variances.
a) Considering Equal Variance
The method for equal variances assumes that the variances of the two groups being compared are the same. In this case, the t-value is calculated using the formula:
$$\LARGE{t = \frac{\overline{x}_1 - \overline{x}_2}{\sqrt{\frac{{s_p}^2}{n_1} + \frac{{s_p}^2}{n_2}}}} $$
Where:
- \(\overline{x}_1\) and \(\overline{x}_2\) are the sample means of the two groups being compared
- \(n_1\) and \(n_2\) are the sample sizes of the two groups being compared
- \({s_p}^2\) is the pooled sample variance, calculated as:
$$\LARGE{{s_p}^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}} $$
Where \(s_1^2\) and \(s_2^2\) are the sample variances for each group.
Degrees of Freedom (Equal Variance)
The degrees of freedom for the two-sample t-test with unequal variances is calculated using the formula:
$$\LARGE{ df = (n_1 + n_2 -2)} $$
Where:
- \(n_1\) and \(n_2\) are the sample sizes of the two groups being compared
b) Considering Unequal Variance
The method for unequal variances assumes that the variances of the two groups being compared are different. In this case, the t-value is calculated using the formula:
$$\LARGE{t = \frac{\overline{x}_1 - \overline{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}} $$
This formula is similar to the one for equal variances but uses the individual sample variances \(s_1^2\) and \(s_2^2\) rather than the pooled sample variance.
Which method you use depends on the assumptions of your data. If you believe that the variances of the two groups are equal, you can use the method for equal variances. If you believe that the variances are unequal, you should use the method for unequal variances. It's generally a good idea to check the assumptions of your data before running the test to ensure that you're using the appropriate method.
Degrees of Freedom (Unequal Variance)
The degrees of freedom for the two-sample t-test with unequal variances are calculated using the formula:
$$\LARGE{ df = \frac{(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2})^2}{\frac{(s_1^2/n_1)^2}{n_1 - 1} + \frac{(s_2^2/n_2)^2}{n_2 - 1}}} $$
Where:
- \(n_1\) and \(n_2\) are the sample sizes of the two groups being compared
- \(s_1^2\) and \(s_2^2\) are the sample variances for each group
If the resulting degrees of freedom are in fractions ( e.g. 4.8), then you round it DOWN to the nearest integer (e.g. 4).
Calculating Critical Values
The critical values for the t-score in a two-sample t-test depend on the degrees of freedom and the significance level of the test.
Depending upon if the variances are considered equal or unequal, the degrees of freedom are calculated using the formula given above.
You can use Microsoft Excel (or other statistical software) to find out the critical value. In Excel, you can use =T.INV(probability, deg_freedom) for left tail value or =T.INV.2T(probability, deg_freedom) for two tail values.
For example, if you are performing a left-tail t-test with a 95% confidence level (that means the alpha value of 0.05) and in the experiment, you had 4 samples in the first group and 5 samples in the second group (that means 7 degrees of freedom considering equal variance), you can use the formula =T.INV(0.05, 7) to find the critical value of -1.8946.
You can then compare the calculated t-score to the critical value to determine whether to reject or fail to reject the null hypothesis. If the calculated t-score exceeds the critical value, the null hypothesis is rejected, and the alternative hypothesis is accepted.