6.4 Inference for a Proportion
If we are working with categorical data our parameter of interest is often the population proportion, p. The point estimate for p is = , where x is the number of successes and n is the sample size. It is also sometimes denoted as p’. We saw previously that, if we meet conditions np ≥ 10 and n(1 − p) ≥ 10, we can apply the central limit theorem and assume:
~ N
How do you know you are dealing with a proportion problem? First, the underlying distribution is a binomial distribution. This will be categorical data with no mention of a mean or average. If X is a binomial random variable, then X ~ B(n, p), where n is the number of trials and p is the probability of a success.
Hypothesis Tests for p
When you perform a hypothesis test of a single population proportion p, the steps are exactly the same as what we have seen before; however, we will calculate our test statistic differently. When conducting a test for p, our hypotheses will look as follows:
- Ho: p = p0
- Ha: p (<,>,≠) p0
Recall, the general form of a test statistic is:
Z =
For the normal distribution of proportions and if
~ N
then the z-score formula is:
z =
Intuitively, you might think we use this as our test statistic, but remember two things:
- We do not actually know p.
- In a hypothesis test, we begin by assuming the null is true.
In keeping with these facts, we substitute in p0 for p in the standard error, which gives us:
We can then find a p-value and make our decision as normal.
Example
Joon believes that 50% of first-time brides in the United States are younger than their grooms. She performs a hypothesis test to determine whether the percentage is 50%. Joon samples 100 first-time brides, and 53 reply that they are younger than their grooms. For the hypothesis test, she uses a 1% level of significance.
Solution
Set up the hypothesis test:
The 1% level of significance means that α = 0.01. This is a test of a single population proportion.
H0: p = 0.50
Ha: p ≠ 0.50
The words “is the same or different from” tell you this is a two-tailed test.
Distribution for the test:
The problem contains no mention of a mean. The information is given in terms of percentages. Use the distribution for P′, the estimated proportion.
~ N
Therefore, ~ N where p = 0.50, q = 1−p = 0.50, n = 100, and SE = 0.05.
Calculate the p-value using the normal distribution for proportions:
x = 53
= = = 0.53.
Z = = 0.6
p-value = 2*P ( > 0.53) = 0.5485
Compare α and the p-value:
Since α = 0.01 and p-value = 0.5485, α < p-value.
Make a decision:
Since α < p-value, you cannot reject H0.
Conclusion:
At the 1% level of significance, the sample data do not show sufficient evidence that the percentage of first-time brides who are younger than their grooms is different from 50%.
The p-value can easily be calculated using technology.
Your Turn!
Confidence Intervals for p
During election years, we see newspaper articles that state confidence intervals in terms of proportions or percentages. For example, a poll for a particular candidate running for president might show that the candidate has 40% of the vote within three percentage points (if the sample is large enough). Often, election polls are calculated with 95% confidence, so the pollsters would be 95% confident that the true proportion of voters who favored the candidate would be between 0.37 and 0.43: (0.40 – 0.03, 0.40 + 0.03).
Investors in the stock market are interested in the true proportion of stocks that rise and fall each week. Businesses that sell personal computers are interested in the proportion of households in the United States that own personal computers. Confidence intervals can be calculated for the true proportion of stocks that rise and fall each week and for the true proportion of households in the United States that own personal computers.
Constructing Confidence Intervals for p
The structure of and procedure to find the confidence interval for a proportion is similar to that for the population mean, but the formulas are different.
The general format of a confidence interval is:
PE – MoE, PE + MoE
The population parameter is . The point estimate for p is , the sample proportion.
The margin of error for a proportion is:
MoE = ( (), where
This formula is similar to the margin of error formula for a mean, except that the “appropriate standard error” is different. For a mean, when the population standard deviation is known, the appropriate standard deviation that we use is . For a proportion, the appropriate standard deviation is .
However, in the margin of error formula, we use as the standard deviation instead of .
In the margin of error formula, the sample proportions and are estimates of the unknown population proportions p and q. The estimated proportions and are used because p and q are not known. The sample proportions and are calculated from the data; is the estimated proportion of successes, and is the estimated proportion of failures.
Example
Suppose that a market research firm is hired to estimate the percent of adults living in a large city who have cell phones. Five hundred randomly selected adult residents in this city are surveyed to determine whether they have cell phones. Of the 500 people surveyed, 421 respond that they own cell phones. Using a 95% confidence level, compute a confidence interval estimate for the true proportion of adult residents of this city who have cell phones.
Solution
Let X = the number of people in the sample who have cell phones. X is binomial. X~B(500, ).
To calculate the confidence interval, you must find p′, q′, and EBP.
n = 500
x = the number of successes = 421
= = = 0.842
= 0.842 is the sample proportion; this is the point estimate of the population proportion.
= 1 – = 1 – 0.842 = 0.158
Since CL = 0.95, then α = 1 – CL = 1 – 0.95 = 0.05 () = 0.025.
Then zα/2 = z0.025 = 1.96
EBP = (zα/2) = (1.96) = 0.032
– EBP = 0.842–0.032 = 0.81
+ EBP = 0.842+0.032 = 0.874
The confidence interval for the true binomial population proportion is ( – EBP, + EBP) = (0.810, 0.874).
Interpretation:
We estimate with 95% confidence that between 81% and 87.4% of all adult residents of this city have cell phones.
Explanation of 95% confidence level:
Ninety-five percent of the confidence intervals constructed in this way would contain the true value for the population proportion of all adult residents of this city who have cell phones.
Your Turn!
Suppose 250 randomly selected people are surveyed to determine if they own a tablet. Of the 250 surveyed, 98 report owning a tablet. Using a 95% confidence level, compute a confidence interval estimate for the true proportion of people who own tablets.
Additional Resources
If you are using an offline version of this text, access the resources for this section via the QR code, or by visiting https://doi.org/10.7294/26207456.
Data that describes qualities or puts individuals into categories; also known as categorical data
The number of individuals that have a characteristic of interest divided by the total number in the population
A random variable that counts the number of successes in a fixed number (n) of independent Bernoulli trials each with probability of a success (p)
A measure of the difference between observations and the hypothesized (or claimed) value