# 5.6 Hypothesis Tests in Depth

Establishing the parameter of interest, type of distribution to use, the test statistic, and *p*-value can help you figure out how to go about a hypothesis test. However, there are several other factors you should consider when interpreting the results.

## Rare Events

Suppose you make an assumption about a property of the population (this assumption is the null hypothesis). Then you gather sample data randomly. If the sample has properties that would be very unlikely to occur if the assumption is true, then you would conclude that your assumption about the population is probably incorrect. Remember that your assumption is just an assumption; it is not a fact, and it may or may not be true. But your sample data are real and are showing you a fact that seems to contradict your assumption.

For example, Didi and Ali are at a birthday party of a very wealthy friend. They hurry to be first in line to grab a prize from a tall basket that they cannot see inside. There are 200 plastic bubbles in the basket, and Didi and Ali have been told that there is only one with a $100 bill. Didi is the first person to reach into the basket and pull out a bubble. Her bubble contains a $100 bill. The probability of this happening is = 0.005. Because this is so unlikely, Ali is hoping they had been misinformed and there are more $100 bills in the basket. A “rare event” has occurred (Didi getting the $100 bill) so Ali doubts the assumption about only one $100 bill being in the basket.

# Errors in Hypothesis Tests

When you perform a hypothesis test, there are four possible outcomes depending on the actual truth (or falseness) of the null hypothesis *H _{0}* and the decision to reject or not. The outcomes are summarized in the following table:

is actually |
||
---|---|---|

Action |
True |
False |

Do not reject | Correct outcome | Type II error |

Reject | Type I error | Correct outcome |

Figure 5.14: Type I and type II errors

The four possible outcomes in the table are:

- The decision is not to reject
*H*when_{0}*H*is true (correct decision)._{0} - The decision is to reject
*H*when_{0}*H*is true (incorrect decision known as a type I error)._{0} - The decision is not to reject
*H*when, in fact,_{0}*H*is false (incorrect decision known as a type II error)._{0} - The decision is to reject
*H*when_{0}*H*is false (correct decision whose probability is called the power of the test)._{0}

Each of the errors occurs with a particular probability. The Greek letters *α* and *β* represent the probabilities.

*α* = probability of a type I error = *P*(type I error) = probability of rejecting the null hypothesis when the null hypothesis is true. These are also known as false positives. We know that *α* is often determined in advance, and *α* = 0.05 is often widely accepted. In that case, you are saying, “We are OK making this type of error in 5% of samples.” In fact, the *p*-value is the exact probability of a type I error based on what you observed.

*β* = probability of a type II error = *P*(type II error) = probability of not rejecting the null hypothesis when the null hypothesis is false. These are also known as false negatives.

The power of a test is 1 – *β*.

Ideally,* α* and *β* should be as small as possible because they are probabilities of errors but are rarely zero. We want a high power that is as close to one as well. Increasing the sample size can help us achieve these by reducing both *α* and *β *and therefore increasing the power of the test.

Example

Suppose the null hypothesis, *H _{0}*, is that Frank’s rock climbing equipment is safe.

Type I error: Frank thinks that his rock climbing equipment may not be safe when, in fact, it really is safe. Type II error: Frank thinks that his rock climbing equipment may be safe when, in fact, it is not safe.

*α* = probability that Frank thinks his rock climbing equipment may not be safe when, in fact, it really is safe. *β* = probability that Frank thinks his rock climbing equipment may be safe when, in fact, it is not safe.

Notice that, in this case, the error with the greater consequence is the type II error, in which Frank thinks his rock climbing equipment is safe, so he goes ahead and uses it.

Your Turn!

Suppose the null hypothesis, *H _{0}*, is that the blood cultures contain no traces of pathogen

*X*. State the type I and type II errors.

# Statistical Significance vs. Practical Significance

When the sample size becomes larger, point estimates become more precise and any real differences in the mean and null value become easier to detect and recognize. Even a very small difference would likely be detected if we took a large enough sample. Sometimes, researchers will take such large samples that even the slightest difference is detected, even differences where there is no practical value. In such cases, we still say the difference is statistically significant, but it is not practically significant.

For example, an online experiment might identify that placing additional ads on a movie review website statistically significantly increases viewership of a TV show by 0.001%, but this increase might not have any practical value.

One role of a data scientist in conducting a study often includes planning the size of the study. The data scientist might first consult experts or scientific literature to learn what would be the smallest meaningful difference from the null value. She also would obtain other information, such as a very rough estimate of the true proportion *p*, so that she could roughly estimate the standard error. From here, she could suggest a sample size that is sufficiently large enough to detect the real difference if it is meaningful. While larger sample sizes may still be used, these calculations are especially helpful when considering costs or potential risks, such as possible health impacts to volunteers in a medical study.

The decision is to reject the null hypothesis when, in fact, the null hypothesis is true

Erroneously rejecting a true null hypothesis or erroneously failing to reject a false null hypothesis

The probability of failing to reject a true hypothesis

Finding sufficient evidence that the observed effect is not just due to variability, often from rejecting the null hypothesis