8.3 Inference for Two Sample Proportions

Adapted by John Morgan Russell; from Barbara Illowsky and Susan Dean, David Diez, Mine Cetinkaya-Rundel and Christopher D. Barr; Julie Vu and David Harrington

8.3 Inference for Two Sample Proportions

Comparing two proportions, like comparing two means, is also very common when we are working with categorical data. If our parameter of inference is p₁-p₂, then we can estimate it with $\hat{p}_{1}$ – $\hat{p}_{2}$

When conducting inference on two independent population proportions, the following characteristics should be present:

The two independent samples are simple random samples that are independent.
The number of successes is at least five, and the number of failures is at least five, for each of the samples.
Growing literature states that the population must be at least ten or 20 times the size of the sample. This keeps each population from being over-sampled and causing incorrect results.

Sampling Distribution of the Difference in Two Proportions

We can build a sampling distribution for $\hat{p}_{1}$ – $\hat{p}_{2}$ similar to how we did for the difference in two independent sample means. The difference of two proportions follows an approximate normal distribution. We will wait to show the standard error and sampling distribution because we calculate it slightly differently for hypothesis tests and confidence intervals

Hypothesis Test for the Difference in Two Proportions

If two estimated proportions are different, it may be due to a difference in the populations or it may be due to chance. A hypothesis test can help determine if a difference in the estimated proportions reflects a difference in the population proportions. A confidence Interval can then

Generally, the null hypothesis states that the two proportions are the same, that is, H₀: p₁ = p₂. Since we are assuming there is no difference in the null, we can use both samples to estimate the pooled proportion, p_p, calculated as follows:

${p}_{p}=\frac{{x}_{1}+{x}_{2}}{{n}_{1}+{n}_{2}}$

We can use this pooled proportion in the calculation of our Z test statistic:

$z=\frac{\left({\hat{p}}_{1}-{\hat{p}}_{2}\right)-\left({p}_{1}-{p}_{2}\right)}{\sqrt{{p}_{p}\left(1-{p}_{p}\right)\left(\frac{1}{{n}_{1}}+\frac{1}{{n}_{2}}\right)}}$

Example

Two types of medication for hives are being tested to determine if there is a difference in the proportions of adult patient reactions. Twenty out of a random sample of 200 adults given medication A still had hives 30 minutes after taking the medication. Twelve out of another random sample of 200 adults given medication B still had hives 30 minutes after taking the medication. Test at a 1% level of significance.

Graph:

Normal distribution curve of the difference in the percentages of adult patients who don't react to medication A and B after 30 minutes. The mean is equal to zero, and the values -0.04, 0, and 0.04 are labeled on the horizontal axis. Two vertical lines extend from -0.04 and 0.04 to the curve. The region to the left of -0.04 and the region to the right of 0.04 are each shaded to represent 1/2(p-value) = 0.0702. — Figure 8.8: Medication A and B

Your turn!

Two types of valves are being tested to determine if there is a difference in pressure tolerances. Fifteen out of a random sample of 100 of Valve A cracked under 4,500 psi. Six out of a random sample of 100 of Valve B cracked under 4,500 psi. Test at a 5% level of significance.

Confidence Intervals for the Difference in Two Proportions

Once we have identified we have a difference in a two sample test, we may want to estimate it. Our confidence interval would be of the form:

$\left(\{PE-MoE},\{PE+MoE}\right)$

Where our point estimate is $\hat{p}_{1}$ – $\hat{p}_{2}$

And the MoE is made up of:

$MoE=\left({z}_{\frac{\alpha }{2}}\right)\left(SE)$ ,
${z}_{\frac{\sigma }{2}}$ is the z critical value with area to the right equal to $\frac{\alpha }{2}$
And SE $\sqrt{\frac{p_1(1-p_1)}{n_1}+\frac{p_2(1-p_2)}{n_2}}$
In the SE will we estimate p₁ with $\hat{p}_{1}$ and p₂ with $\hat{p}_{2}$ if we do not know them.

Putting that all together our formula for a CI to estimate the difference in two proportions will be:

$\hat{p}_{1}$ – $\hat{p}_{2}\pm\left({z}_{\frac{\alpha }{2}}\right)\sqrt{\frac{\hat{p}_{1}\text{(1-}\hat{p}_{1})}{n_1}+\frac{\hat{p}_{2}\text{(1-}\hat{p}_{2})}{n_2}}$

Image References

Figure 8.8: Kindred Grey via Virginia Tech (2020). “Figure 8.8” CC BY-SA 4.0. Retrieved from https://commons.wikimedia.org/wiki/File:Figure_8.8.png . Adaptation of Figure 5.39 from OpenStax Introductory Statistics (2013) (CC BY 4.0). Retrieved from https://openstax.org/books/statistics/pages/5-practice

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Significant Statistics - beta (extended) version Copyright © 2020 by John Morgan Russell, OpenStaxCollege, OpenIntro is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.

Sampling Distribution of the Difference in Two Proportions

Hypothesis Test for the Difference in Two Proportions

Confidence Intervals for the Difference in Two Proportions

Image References

License

Share This Book