5.2 The Normal Distribution
The normal or "Gaussian" distribution is the most important of all the distributions, continuous or otherwise. Its graph is symmetric, bell-shaped, and unimodal. It is widely used and even more widely abused. You see this distribution in almost all disciplines including psychology, business, economics, the sciences, nursing, and, of course, mathematics. Some of your instructors may use the normal distribution to help determine your grade. Many things in the natural world or dealing with humans such as IQ scores or real-estate prices fit a normal distribution.
The normal distribution has two parameters (two numerical descriptive measures): the mean (μ) and the standard deviation (σ). If X is a quantity to be measured that has a normal distribution with mean (μ) and standard deviation (σ), we designate this by writing X~N(μ, σ).
The probability density function of this curve is as follows:
f(x) =
where:
- -∞ < X < ∞
- -∞ < μ < ∞
- σ > 0
As you can see, the normal pdf is a rather complicated function. This could be a problem since the normal distribution is so widely used. However we will see some ways we can work around this.
The cumulative distribution function is P(X ≤ x). It can be calculated either by calculus, technology, or a table. Technology has made the tables almost obsolete.
The curve is symmetric about a vertical line drawn through the mean, μ. In theory, the mean is the same as the median, because the graph is symmetric about μ. As the notation indicates, the normal distribution depends only on the mean and the standard deviation. Since the area under the curve must equal one, a change in the standard deviation, σ, causes a change in the shape of the curve; the curve becomes fatter or skinnier depending on σ. A change in μ causes the graph to shift to the left or right. This means there are an infinite number of normal probability distributions. One of special interest is called the standard normal distribution.
The Empirical Rule
- Approximately 68% of the values of x are within one standard deviation of the mean. (±σ or z-scores of±1)
- Approximately 95% of the values of x are within two standard deviations of the mean. (±2σ or z-scores of±2)
- Approximately 99.7% of the values of x are within three standard deviations of the mean. (±3σ or z-scores of±3)
The Empirical Rule is also known as the 68-95-99.7 rule.
Example
Suppose x has a normal distribution with mean 50 and standard deviation 6.
- About 68% of the x values lie within one standard deviation of the mean. Therefore, about 68% of the x values lie between –1σ = (–1)(6) = –6 and 1σ = (1)(6) = 6 of the mean 50. The values 50 – 6 = 44 and 50 + 6 = 56 are within one standard deviation from the mean 50. The z-scores are –1 and +1 for 44 and 56, respectively.
- About 95% of the x values lie within two standard deviations of the mean. Therefore, about 95% of the x values lie between –2σ = (–2)(6) = –12 and 2σ = (2)(6) = 12. The values 50 – 12 = 38 and 50 + 12 = 62 are within two standard deviations from the mean 50. The z-scores are –2 and +2 for 38 and 62, respectively.
- About 99.7% of the x values lie within three standard deviations of the mean. Therefore, about 95% of the x values lie between –3σ = (–3)(6) = –18 and 3σ = (3)(6) = 18 from the mean 50. The values 50 – 18 = 32 and 50 + 18 = 68 are within three standard deviations of the mean 50. The z-scores are –3 and +3 for 32 and 68, respectively.
Your turn!
From 1984 to 1985, the mean height of 15 to 18-year-old males from Chile was 172.36 cm, and the standard deviation was 6.34 cm. Let Y = the height of 15 to 18-year-old males in 1984 to 1985. Then Y ~ N(172.36, 6.34).
- About 68% of the y values lie between what two values? These values are and . The z-scores are and , respectively.
- About 95% of the y values lie between what two values? These values are and . The z-scores are and , respectively.
- About 99.7% of the y values lie between what two values? These values are and . The z-scores are and , respectively.
Finding Normal Probabilities
The shaded area in the following graph indicates the area to the left of x. This area is represented by the probability P(X < x).
The area to the right is then P ( X > x ) = 1 – P ( X < x ). Remember, P ( X < x ) = Area to the left of the vertical line through x . P ( X > x ) = 1 – P ( X < x ) = Area to the right of the vertical line through x . P ( X < x ) is the same as P ( X ≤ x ) and P ( X > x ) is the same as P ( X ≥ x ) for continuous distributions.
There are 3 main ways we can find probabilities associated with the Normal Distribution. These include:
- Math (via Calculus Integration)
- The Standardizing Process
- Technology
We would like to avoid complicated math if possible.
In order to avoid the math, a processes called “Standardizing” can be used. This involves Z scores, the Standard Normal Distribution and Tables. Although this tried and true process is now somewhat antiquated, it is a great place to start.
There are many technologies such as calculators and various statistical software that let us skip the entire standardizing process and instantaneously provide us with a probability. Although we typically have these at our disposal to use in practice, it is good to understand the process going on behind the scenes to make sure we apply our technology correctly.
The Standard Normal Distribution
The standard normal distribution (SND) is the simplest form of the normal distribution you can think of. The mean for the standard normal distribution is zero, and the standard deviation is one. The transformation z = produces the distribution Z ~ N(0, 1). The value x in the given equation comes from a normal distribution with mean μ and standard deviation σ.
Recall our previous discussion of z-scores, which are converted to units of the standard deviation. If X is a normally distributed random variable and X ~ N(μ, σ), then the z-score is:
Recall a z-score tells you how many standard deviations the value x is above (to the right of) or below (to the left of) the mean, μ. Values of x that are larger than the mean have positive z-scores, and values of x that are smaller than the mean have negative z-scores. If x equals the mean, then x has a z-score of zero.
We have the Z table at our disposal with probabilities already calculated and organized. Note that most Z tables give us the left tailed, CDF, or “less than” probability. For example the area to the left of a Z score of -3.37, P(Z ≤ -3.37) =0.0004.
The SND CDF value, P(Z ≤ z), is also denoted as Φ(z). We can then use these CDF values, P(Z ≤ z), and some probability rules to find greater than [P(Z ≥ z)=1-P(Z ≤ z)] or in between [P(a ≤ Z ≤ b) = P(Z ≤ b) – P(Z ≤ a) ] probabilities.
Example
Use the Z table to find the following probabilities:
a. P(Z ≤ 1):
b. P(Z ≥ 1):
c. P(-1 ≤ Z ≤ 1):
Your Turn
Use the Z table to find the following probabilities:
a. P(Z ≤ -0.54):
b. P(Z ≥ 1.2):
c. P(-1.5 ≤ Z ≤ 0.84):
The standardizing process
So far we have seen the idea that we can convert any normal distribution with any mean and standard deviation to the standard normal distribution in units of z-scores. We also have the associated probabilities in our Z table. Essentially, the work has been done for us if we know how to standardize and look up the associated probability in the table. The general process is:
X~N(μ,σ) -> Z`~N(0,1) -> Probability from Z table
This process, while maybe outdated in our technology age, is good for beginners to understand and useful when we do not have access to technology.
Example
Height and weight are two measurements used to track a child’s development. The World Health Organization measures child development by comparing the weights of children who are the same height and the same gender. In 2009, weights for all 80 cm girls in the reference population had a mean µ = 10.2 kg and standard deviation σ = 0.8 kg. Weights are normally distributed. X ~ N(10.2, 0.8). Calculate the z-scores that correspond to the following weights, then find the associated probabilities.
a. The probability that a child weighs less than 11 kg
b. The probability that a child weighs more than 7.9 kg
c. The probability that a child weighs between 11.2 and 12.2 kg ≥≤
Your Turn
The golf scores for a school team were normally distributed with a mean of 68 and a standard deviation of three.
a. Find the probability that a randomly selected golfer scored less than 65.
b. The golf scores for a school team were normally distributed with a mean of 68 and a standard deviation of three. Find the probability that a golfer scored between 66 and 70.
The “un-standardizing” process
Sometimes we may be given a percentile or z-score and want to work backwards through the standardizing process to find a value on the original distribution. You could call this “un-standardizing” or finding a normal quantile. The process looks like this:
Probability in Z table -> Z~N(0,1) -> X~N(μ,σ)
For example, if the mean of a normal distribution is five and the standard deviation is two, what value is three standard deviations above (or to the right of) the mean (z-score = three). Rearranging the z-score formula, the calculation is as follows:
x = μ + (z)(σ) = 5 + (3)(2) = 11
Often we are given a percentile to find on the original distribution. For example, what if we want to know a value on the previous distribution that corresponds to the 90th percentile? We can look up a probability of 0.9 in the Z table and find a corresponding z-score of approximately 1.28.
x = μ + (z)(σ) = 5 + (1.28)(2) = 7.56
Example
A citrus farmer who grows mandarin oranges finds that the diameters of mandarin oranges harvested on his farm follow a normal distribution with a mean diameter of 5.85 cm and a standard deviation of 0.24 cm.
a. Find the 90th percentile for the diameters of mandarin oranges:
b. The middle 20% of mandarin oranges from this farm have diameters between and .
Your Turn
Two thousand students took an exam. The scores on the exam have an approximate normal distribution with a mean μ = 81 points and standard deviation σ = 15 points.
- Calculate the first and third quartile scores for this exam.
- The middle 50% of the exam scores are between what two values?
Image References
Figure 5.10: Kindred Grey via Virginia Tech (2020). “Figure 5.10” CC BY-SA 4.0. Retrieved from https://commons.wikimedia.org/wiki/File:Figure_5.10.png . Adaptation of Figure 5.39 from OpenStax Introductory Statistics (2013) (CC BY 4.0). Retrieved from https://openstax.org/books/statistics/pages/5-practice
Figure 5.11: Kindred Grey via Virginia Tech (2020). “Figure 5.11” CC BY-SA 4.0. Retrieved from https://commons.wikimedia.org/wiki/File:Figure_5.11.png . Adaptation of Figure 5.39 from OpenStax Introductory Statistics (2013) (CC BY 4.0). Retrieved from https://openstax.org/books/statistics/pages/5-practice
Figure 5.12: Kindred Grey via Virginia Tech (2020). “Figure 5.12” CC BY-SA 4.0. Retrieved from https://commons.wikimedia.org/wiki/File:Figure_5.12.png . Adaptation of Figure 5.39 from OpenStax Introductory Statistics (2013) (CC BY 4.0). Retrieved from https://openstax.org/books/statistics/pages/5-practice
Figure 5.13: Kindred Grey via Virginia Tech (2020). “Figure 5.13” CC BY-SA 4.0. Retrieved from https://commons.wikimedia.org/wiki/File:Figure_5.13.png . Adaptation of ‘Normal Probability Table’ from OpenIntro Statistics (2019) (CC BY-SA 3.0). Retrieved from https://openintro.org/go/?id=stat_prob_tables_normal_t_chisq&referrer=/book/isrs/index.php
A commonly used symmetric, unimodal, bell-shaped, continuous probability distribution
A function that defines a continuous random variable, and the likelihood of an outcome
Roughly 68% of values are within 1 standard deviation of the mean, roughly 95% of values are within 2 standard deviations of the mean, and 99.7% of values are within 3 standard deviations of the mean
A normal random variable with a mean of 0 and standard deviation of 1 which z-scores follow; denoted N(0, 1)
A measure of location that tells us how many standard deviations a value is above or below the mean
Points in a distribution that relate to the rank order of values in that distribution