# 7.3 Inference for Two-Sample Proportions

Comparing two proportions, like comparing two means, is also very common when we are working with categorical data. If our parameter of inference is *p _{1 }– p_{2}*, then we can estimate it with – .

When conducting inference on two independent population proportions, the following characteristics should be present:

- The two independent samples are simple random samples that are independent.
- For each of the samples, the number of successes is at least five, and the number of failures is at least five.
- Growing literature states that the population must be at least ten or 20 times the size of the sample. This keeps each population from being over-sampled and causing incorrect results.

# Sampling Distribution of the Difference in Two Proportions

We can build a sampling distribution for – similar to what we did for the difference in two independent sample means. The difference of two proportions follows an approximate normal distribution. We will wait to show the standard error and sampling distribution because we calculate them slightly differently for hypothesis tests and confidence intervals.

# Hypothesis Test for the Difference in Two Proportions

If two estimated proportions are different, it may be due to a difference in the populations, or it may be due to chance. A hypothesis test can help determine if a difference in the estimated proportions reflects a difference in the population proportions.

Generally, the null hypothesis states that the two proportions are the same (i.e., *H _{0}*:

*p*=

_{1}*p*). Since we are assuming there is no difference in the null, we can use both samples to estimate the pooled proportion,

_{2}*p*, calculated as follows:

_{p }*p _{p}* =

We can use this pooled proportion in the calculation of our *z*-test statistic:

*z* =

Example

Two types of medication for hives are being tested to determine if there is a difference in the proportions of adult patient reactions. Twenty out of a random sample of 200 adults given Medication A still had hives 30 minutes after taking the medication. Twelve out of another random sample of 200 adults given Medication B still had hives 30 minutes after taking the medication. Test at a 1% level of significance.

**Solution**

The problem asks for a difference in proportions, making it a test of two proportions.

Let A and B be the subscripts for medication A and medication B, respectively. Then p_{A} and p_{B} are the desired population proportions.

Random variable: – = difference in the proportions of adult patients who did not react after 30 minutes to medication A and to medication B.

H_{0}: = or – = 0

H_{A}: ≠ or – ≠ 0

The words “is a difference” tell you the test is two-tailed.

**Distribution for the test:**

Since this is a test of two binomial population proportions, the distribution is normal.

Find the pooled proportion: *p _{p}*

*p _{p}* = = = 0.08

1 – p_{p} = 0.92

– follows an approximate normal distribution.

**Calculate the p-value using the normal distribution: **

*p*-value = 0.1404

Estimated proportion for group A: = = = 0.1

Estimated proportion for group B: = = = 0.06

– = 0.1 – 0.06 = 0.04.

Half the *p*-value is below –0.04, and half is above 0.04.

**Compare α and the p-value: **

*α* = 0.01 and the *p*-value = 0.1404. *α* < *p*-value.

**Make a decision: **

Since *α* < *p*-value, do not reject *H _{0}*.

**Conclusion:** At a 1% level of significance, from the sample data, there is not sufficient evidence to conclude that there is a difference in the proportions of adult patients who did not react after 30 minutes to medication *A* and medication *B*.

Your Turn!

Two types of valves are being tested to determine if there is a difference in pressure tolerances. For Valve A, 15 out of a random sample of 100 cracked under 4,500 psi. For Valve B, six out of a random sample of 100 cracked under 4,500 psi. Test at a 5% level of significance.

# Confidence Intervals for the Difference in Two Proportions

Once we have identified the presence of a difference in a two-sample test, we may want to estimate it. Our confidence interval would be of the form *(PE – MoE, PE + MoE), *where our point estimate is – , and the MoE is made up of:

*MoE* =

- is the
*z*critical value with area to the right equal to *SE*- In the
*SE*will we estimate*p*with and_{1}*p*with if we do not know them._{2}

- In the

Putting that all together, our formula for a CI to estimate the difference in two proportions will be:

–

**Figure References**

Figure 7.8: Kindred Grey (2020). *Medications A and B.* CC BY-SA 4.0.

The number of individuals that have a characteristic of interest divided by the total number in the population

The probability distribution of a statistic at a given sample size

Estimate of the common value of p1 and p2

An interval built around a point estimate for an unknown population parameter