7.2 Inference for Two Independent Sample Means

Suppose we have two independent samples of quantitative data. If there is no apparent relationship between the means, our parameter of interest is the difference in means, μ12, with a point estimate of \overline{X}_1}\overline{X}_2}.

The comparison of two population means is very common. A difference between the two samples depends on both the means and their respective standard deviations. Very different means can occur by chance if there is great variation among the individual samples. In order to account for the variation, we take the difference of the sample means and divide by the standard error, standardizing the difference. We know that when conducting an inference for means, the sampling distribution we use (Z or t) depends on our knowledge of the population standard deviation.

Both Population Standard Deviations Known (Z)

Even though this situation is unlikely since population standard deviations are rarely known, we will begin demonstrating these ideas under the ideal circumstances. If we know both means’ sampling distributions are normal, the sampling distribution for the difference between the means is normal, and both populations must be normal. We can combine the standard errors of each sampling distribution to get a standard error of:

\sqrt{\frac{{\left({\sigma }_{1}\right)}^{2}}{{n}_{1}}+\frac{{\left({\sigma }_{2}\right)}^{2}}{{n}_{2}}}

So the sampling distribution of \overline{X}_1}\overline{X}_2}, assuming we know both standard deviations, is approximately:

N \left({\mu }_{1}-{\mu }_{2},\sqrt{\frac{{\left({\sigma }_{1}\right)}^{2}}{{n}_{1}}+\frac{{\left({\sigma }_{2}\right)}^{2}}{{n}_{2}}}\right)

Therefore, the z-test statistic would be:

z = \frac{\left({\overline{x}}_{1}-{\overline{x}}_{2}\right)-\left({\mu }_{1}-{\mu }_{2}\right)}{\sqrt{\frac{{\left({\sigma }_{1}\right)}^{2}}{{n}_{1}}+\frac{{\left({\sigma }_{2}\right)}^{2}}{{n}_{2}}}}

Our confidence interval would be in the form (PE – MoE, PE + MoE), where our point estimate is \overline{X}_1}\overline{X}_2}, and the margin of error is made up of:

MoE = \left({z}_{\frac{\alpha }{2}}\right)\left(SE)

  • {z}_{\frac{\sigma }{2}} is the z critical value with area to the right equal to \frac{\alpha }{2}
  • SE is \sqrt{\frac{{\left({\sigma }_{1}\right)}^{2}}{{n}_{1}}+\frac{{\left({\sigma }_{2}\right)}^{2}}{{n}_{2}}}

Since we rarely know one population’s standard deviation, much less two, the only situation where we might consider using this in practice is for two very large samples.

Both Population Standard Deviations Unknown (t)

Most likely, we will not know the population standard deviations, but we can estimate them using the two sample standard deviations from our independent samples. In this case, we will use a t sampling distribution with the following standard error:

\sqrt{\frac{\left({s}_{1}{\right)}^{2}}{{n}_{1}}+\frac{\left({s}_{2}{\right)}^{2}}{{n}_{2}}}

Assumptions for the Difference in Two Independent Sample Means

Recall that we need to be able to assume an underlying normal distribution and no outliers or skewness in order to use the t-distribution. We can relax these assumptions as our sample sizes get bigger and can typically just use the Z distribution for very large sample sizes.

The remaining question concerns what we do for degrees of freedom when comparing two groups. One method requires a somewhat complicated calculation, but if you have access to a computer or calculator, this isn’t an issue. We can find a precise df for two independent samples as follows:

df = \frac{{\left(\frac{{\left({s}_{1}\right)}^{2}}{{n}_{1}}+\frac{{\left({s}_{2}\right)}^{2}}{{n}_{2}}\right)}^{2}}{\left(\frac{1}{{n}_{1}-1}\right){\left(\frac{{\left({s}_{1}\right)}^{2}}{{n}_{1}}\right)}^{2}+\left(\frac{1}{{n}_{2}-1}\right){\left(\frac{{\left({s}_{2}\right)}^{2}}{{n}_{2}}\right)}^{2}}

NOTE: The df are not always a whole number; you usually want to round down. It is not necessary to compute this by hand. Find a reliable technology to do this.

If you are working on your own without access to technology, the above formula could be daunting. Another method is to use a conservative estimate of the df: min(n1-1, n2-1).

Hypothesis Tests for the Difference in Two Independent Sample Means

Recall that the steps to a hypothesis test never change. When our parameter of interest is μ12, we are often interested in an effect between the two groups. In order to show an effect, we will have to first assume there is no difference by stating it in the null hypothesis as:

  • Ho: μ1 – μ2 = 0 OR Ho: μ1 = μ2
  • Ha: μ1 – μ2 (<, >, ≠) 0 OR Ha: μ1 (<, >, ≠) μ2

The t-test statistic is calculated as follows:

\frac{\text{(}{\overline{x}}_{1}-{\overline{x}}_{2}\text{)}-\text{(}{\mu }_{1}-{\mu }_{2}\text{)}}{\sqrt{\frac{{\text{(}{s}_{1}\text{)}}^{2}}{{n}_{1}}+\frac{{\text{(}{s}_{2}\text{)}}^{2}}{{n}_{2}}}}

where:

  • s1 and s2, the sample standard deviations, are estimates of σ1 and σ2, respectively.
  • \overline{x}_{1} and \overline{x}_{2} are the sample means. μ1 and μ2 are the population means. (NOTE: in the null, we are typically assuming μ1 – μ2 = 0.)

Confidence Intervals for the Difference in Two Independent Sample Means

Once we have identified a difference in a hypothesis test, we may want to estimate it. Our confidence interval would be of the form (PE – MoE, PE + MoE), where our point estimate is \overline{x}_{1}\overline{x}_{2}, and the MoE is made up of:

MoE = \left({t}_{\frac{\alpha }{2}}\right)\left(SE)

  • {t}_{\frac{\sigma }{2}} is the t critical value with area to the right equal to \frac{\alpha }{2}
  • SE is \(\sqrt{\frac{\left({s}_{1}{\right)}^{2}}{{n}_{1}}+\frac{\left({s}_{2}{\right)}^{2}}{{n}_{2}}}\)
definition

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Significant Statistics: An Introduction to Statistics Copyright © 2024 by John Morgan Russell is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book