# 6.6 Hypothesis Tests In-Depth

Establishing the parameter of interest, type of distribution to use, the test statistic and p-value can help you figure out how to go about a hypothesis test. However, there are several other factors you should consider when interpreting the results.

## Rare Events

Suppose you make an assumption about a property of the population (this assumption is the null hypothesis). Then you gather sample data randomly. If the sample has properties that would be very unlikely to occur if the assumption is true, then you would conclude that your assumption about the population is probably incorrect. (Remember that your assumption is just an assumption—it is not a fact and it may or may not be true. But your sample data are real and the data are showing you a fact that seems to contradict your assumption.)

For example, Didi and Ali are at a birthday party of a very wealthy friend. They hurry to be first in line to grab a prize from a tall basket that they cannot see inside because they will be blindfolded. There are 200 plastic bubbles in the basket and Didi and Ali have been told that there is only one with a $100 bill. Didi is the first person to reach into the basket and pull out a bubble. Her bubble contains a$100 bill. The probability of this happening is = 0.005. Because this is so unlikely, Ali is hoping that what the two of them were told is wrong and there are more $100 bills in the basket. A “rare event” has occurred (Didi getting the$100 bill) so Ali doubts the assumption about only one \$100 bill being in the basket.

# Errors in Hypothesis Tests

When you perform a hypothesis test, there are four possible outcomes depending on the actual truth (or falseness) of the null hypothesis H0 and the decision to reject or not. The outcomes are summarized in the following table:

Figure 6.14: Type 1 and Type 2 Errors
H0 IS ACTUALLY
ACTION True False
Do not reject H0 Correct Outcome Type II error
Reject H0 Type I Error Correct Outcome

The four possible outcomes in the table are:

1. The decision is not to reject H0 when H0 is true (correct decision).
2. The decision is to reject H0 when H0 is true (incorrect decision known as a ).
3. The decision is not to reject H0 when, in fact, H0 is false (incorrect decision known as a ).
4. The decision is to reject H0 when H0 is false (correct decision whose probability is called the of the test).

Each of the errors occurs with a particular probability. The Greek letters α and β represent the probabilities.

α = probability of a = P(Type I error) = probability of rejecting the null hypothesis when the null hypothesis is true.

β = probability of a = P(Type II error) = probability of not rejecting the null hypothesis when the null hypothesis is false.

The is 1 – β.

Ideally, α and β should be as small as possible because they are probabilities of errors, but rarely are they zero. We want a high power that is as close to one as well.  Increasing the sample size can help us achieve these by reducing both α and β, and therefore increasing the power of the test.

Example

Suppose the null hypothesis, H0, is: Frank’s rock climbing equipment is safe.

Type I error: Frank thinks that his rock climbing equipment may not be safe when, in fact, it really is safe. Type II error: Frank thinks that his rock climbing equipment may be safe when, in fact, it is not safe.

α = probability that Frank thinks his rock climbing equipment may not be safe when, in fact, it really is safe. β = probability that Frank thinks his rock climbing equipment may be safe when, in fact, it is not safe.

Notice that, in this case, the error with the greater consequence is the Type II error. (If Frank thinks his rock climbing equipment is safe, he will go ahead and use it.)

Suppose the null hypothesis, H0, is: the blood cultures contain no traces of pathogen X. State the Type I and Type II errors.

# Statistical Significance Versus Practical Significance

When the sample size becomes larger, point estimates become more precise and any real differences in the mean and null value become easier to detect and recognize. Even a very small difference would likely be detected if we took a large enough sample. Sometimes researchers will take such large samples that even the slightest difference is detected, even differences where there is no practical value. In such cases, we still say the difference is , but it is not practically significant.

For example, an online experiment might identify that placing additional ads on a movie review website statistically significantly increases viewership of a TV show by 0.001%, but this increase might not have any practical value.

One role of a data scientist in conducting a study often includes planning the size of the study. The data scientist might first consult experts or scientific literature to learn what would be the smallest meaningful difference from the null value. She also would obtain other information, such as a very rough estimate of the true proportion p, so that she could roughly estimate the standard error. From here, she can suggest a sample size that is sufficiently large that, if there is a real difference that is meaningful, we could detect it. While larger sample sizes may still be used, these calculations are especially helpful when considering costs or potential risks, such as possible health impacts to volunteers in a medical study. 