7.4 Inference for a Proportion

If we are working with categorical data our parameter of interest is often the population proportion, p.  The point estimate for p is  \hat{p} = \frac{x}{n} where x is the number of successes and n is the sample size.  It is also sometimes denoted as \({p}^{\prime }. We saw previously that if we meet conditions, np ≥ 10 and n(1 − p) ≥ 10, we can apply the central limit theorem and assume:

\hat{p} ~N\left(p,\sqrt{\frac{p\cdot q}{n}}\right)

How do you know you are dealing with a proportion problem? First, the underlying distribution is a binomial distribution. This will be categorical data with no mention of a mean or average. If X is a binomial random variable, then X ~ B(n, p) where n is the number of trials and p is the probability of a success.

Hypothesis Tests for p

When you perform a hypothesis test of a single population proportion p, the steps are exactly the same as what we have seen before, however we will calculate our Test Statistic differently.  When conducting a test for p, our hypotheses will look as follows:

  • Ho: p = p0
  • Ha: p (<,>,≠) p0

Recall, the general form of a test statistic is:

\text{Z=}\frac{\text{point estimate - null value}}{\text{SE}}

For the normal distribution of proportions, the z-score formula is as follows:

If \hat{p} ~N\left(p,\sqrt{\frac{p\cdot q}{n}}\right)then the z-score formula is:

z=\frac{\hat{p}\text{-p}}{\sqrt{\frac{pq}{n}}}

Intuitively, you might think we use this as our test statistic but remember two things:

  1. We do not actually know p
  2. In a hypothesis test we begin by assuming the null is true

Sure to these facts, we substitute in pfor p in the standard error which gives us:

\sigma_{\hat{p}}\text{ = }\sqrt{\frac{p_o\text{(1-} p_o )}{n}}

We then can find a p-value and make our decision as normal

Example

Joon believes that 50% of first-time brides in the United States are younger than their grooms. She performs a hypothesis test to determine if the percentage is the same or different from 50%. Joon samples 100 first-time brides and 53 reply that they are younger than their grooms. For the hypothesis test, she uses a 1% level of significance.

You Try It

Marketers believe that 92% of adults in the United States own a cell phone. A cell phone manufacturer believes that number is actually lower. 200 American adults are surveyed, of which, 174 report having cell phones. Use a 5% level of significance. State the null and alternative hypothesis, find the p-value, state your conclusion, and identify the Type I and Type II errors.

Confidence Intervals for p

During an election year, we see articles in the newspaper that state confidence intervals in terms of proportions or percentages. For example, a poll for a particular candidate running for president might show that the candidate has 40% of the vote within three percentage points (if the sample is large enough). Often, election polls are calculated with 95% confidence, so, the pollsters would be 95% confident that the true proportion of voters who favored the candidate would be between 0.37 and 0.43: (0.40 – 0.03,0.40 + 0.03).

Investors in the stock market are interested in the true proportion of stocks that go up and down each week. Businesses that sell personal computers are interested in the proportion of households in the United States that own personal computers. Confidence intervals can be calculated for the true proportion of stocks that go up or down each week and for the true proportion of households in the United States that own personal computers.

Constructing Confidence Intervals for p

The structure of, and procedure to find the confidence interval for a proportion is similar to that for the population mean, but the formulas are different.

The general format of a confidence interval is:
(PE-MoE, PE+MoE)

The population parameter is p. ˆThe point estimate for p, is \hat{p}, the sample proportion.

The margin of error bound for a proportion is:

MoE=\left({z}_{\frac{\alpha }{2}}\right)\left(\sqrt{\frac{\hat{p}\hat{q}}{n}}\right) where \hat{q} \text{= 1 -} \hat{p}

This formula is similar to the error bound formula for a mean, except that the “appropriate standard error” is different. For a mean, when the population standard deviation is known, the appropriate standard deviation that we use is \frac{\sigma }{\sqrt{n}}. For a proportion, the appropriate standard deviation is \sqrt{\frac{\hat{p}\hat{q}}{n}}.

However, in the error bound formula, we use \sqrt{\frac{\hat{p}\hat{q}}{n}} as the standard deviation, instead of \sqrt{\frac{pq}{n}}.

In the error bound formula, the sample proportions \hat{p} and \hat{q} are estimates of the unknown population proportions p and q. The estimated proportions \hat{p} and \hat{q} are used because p and q are not known. The sample proportions  and \hat{q} are calculated from the data:  is the estimated proportion of successes, and \hat{q} is the estimated proportion of failures.

Example

Suppose that a market research firm is hired to estimate the percent of adults living in a large city who have cell phones. Five hundred randomly selected adult residents in this city are surveyed to determine whether they have cell phones. Of the 500 people surveyed, 421 responded yes – they own cell phones. Using a 95% confidence level, compute a confidence interval estimate for the true proportion of adult residents of this city who have cell phones.

Your turn!

Suppose 250 randomly selected people are surveyed to determine if they own a tablet. Of the 250 surveyed, 98 reported owning a tablet. Using a 95% confidence level, compute a confidence interval estimate for the true proportion of people who own tablets.

definition

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Significant Statistics Copyright © 2020 by John Morgan Russell, OpenStaxCollege, OpenIntro is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book