4.2 Discrete Random Variables
A student takes a ten-question, true-false quiz. Because of the student’s busy schedule, they could not study and guesses randomly at each answer. What is the probability of the student passing the test with at least a 70%?
Small companies might be interested in the number of long-distance phone calls their employees make during the peak time of the day. Suppose the average is 20 calls. What is the probability that the employees make more than 20 long-distance phone calls during peak hours?
These two examples illustrate two different types of probability problems involving discrete random variables. Recall that discrete data are data that you can count. A random variable describes the outcomes of a statistical experiment in words. The values of a random variable can vary with each repetition of an experiment.
Discrete Random Variables
We have previously seen the word discrete associated with types of data. Discrete means we have a countable number of outcomes, so a discrete random variable is an RV that models a process or experiment that produces discrete data.
For instance, let X stand for the number of heads you get when you toss three fair coins. The sample space for the toss of three fair coins is TTT, THH, HTH, HHT, HTT, THT, TTH, HHH. Then, x = 0, 1, 2, 3. X is given in words, while x is given in numbers. Notice that for this example, the x values are countable outcomes. Because you can count the possible values that X can take on and the outcomes are random (the x values 0, 1, 2, and 3), X is a discrete random variable.
Example
A child psychologist is interested in the number of times a newborn baby’s crying wakes its mother after midnight. For a random sample of 50 mothers, the following information was obtained. Let X represent the number of times per week a newborn baby’s crying wakes its mother after midnight. For this example, x = 0, 1, 2, 3, 4, 5.
P(x) = probability that X takes on a value x.
x | P(x) |
---|---|
0 | |
1 | |
2 | |
3 | |
4 | |
5 |
Figure 4.2: Newborn baby crying
Is this a valid discrete probability distribution?
Solution
X takes on the values 0, 1, 2, 3, 4, and 5. This is a valid discrete PDF because:
Each P(x) is between zero and one, inclusive.
The sum of the probabilities is one, that is,
+ + + + + = 1
Your Turn!
A hospital researcher is interested in the number of times the average post-op patient will ring the nurse during a 12-hour shift. For a random sample of 50 patients, the following information was obtained. Let X represent the number of times a patient rings the nurse during a 12-hour shift. For this exercise, x = 0, 1, 2, 3, 4, 5. P(x) = the probability that X takes on value x. Is this a discrete probability distribution function? If so, provide two reasons why this is or is not the case.
X | P(x) |
---|---|
0 | |
1 | |
2 | |
3 | |
4 | |
5 |
Figure 4.3: Post-op patients
Characteristics and Notation
The distribution of a discrete random variable is often pictured in a table, but it may also be represented by a graph or formula. There are two main characteristics it should exhibit:
- Each probability is between zero and one, inclusive.
- The sum of the probabilities is one.
The probability mass function (PMF) of a DRV tells you the probability of a random variable taking on a certain value. Notation-wise, this means P(X = x). This is also sometimes (erroneously) called probability distribution function (PDF).
The cumulative distribution function (CDF) of a DRV tells you the probability of random variable being less than or equal to a certain value. Notation-wise, this means P(X ≤ x).
A probability distribution function is a pattern. You try to fit a probability problem into a pattern or distribution in order to perform the necessary calculations. These distributions are tools that make it easier to solve probability problems. Each distribution has its own special characteristics. Learning the characteristics enables you to distinguish among the different distributions.
Example
Suppose Nancy has classes three days a week. She attends all three days 80% of the time, two days 15% of the time, one day 4% of the time, and no days 1% of the time. Suppose one week is randomly selected.
Let X represent the number of days Nancy .
Solution
attends class per week
X takes on what values?
Solution
0, 1, 2, and 3
Construct a probability distribution table (called a PDF table) like the one below for the week chosen at random. The table should have two columns labeled x and P(x). What does the P(x) column sum to?
x | P(x) |
---|---|
0 | |
1 | |
2 | |
3 |
Figure 4.4: Blank PDF
Solution
x | f(x) |
---|---|
0 | 0.01 |
1 | 0.04 |
2 | 0.15 |
3 | 0.80 |
Figure 4.5: Blank table
Construct the cumulative probability distribution function.
Solution
x | f(x) | F(x) |
---|---|---|
0 | 0.01 | 0.01 |
1 | 0.04 | 0.05 |
2 | 0.15 | 0.20 |
3 | 0.80 | 1 |
Figure 4.6: Cumulative PDF
Your Turn!
Jeremiah has basketball practice two days a week. Ninety percent of the time, he attends both practices. Eight percent of the time, he attends one practice. Two percent of the time, he does not attend either practice. What is X, and what values does it take on?
Measures of Discrete Random Variables
Once we know how to work with discrete random variables, we may be interested in some other measures of the data, such as the mean, variance, and standard deviation. These ideas resurface here, but in the slightly different context of random variables.
The Expected Value (Mean) of a Discrete Random Variable
Recall the law of large numbers, which states as the number of trials in a probability experiment increases, our results become closer to what we expect. When evaluating the long-term results of statistical experiments, we often want to know the “average” outcome. This long-term average is known as the mean or expected value of the random variable and is denoted by the Greek letter μ or E[X] in the context of random variables. In other words, this is the average value you would expect after conducting many trials of an experiment.
To find the expected value or long-term average, we simply multiply each value of the random variable by its probability and add the products. It is essentially a probability weighted average of the values of the random variable.
Mean or expected value:
Example
A men’s soccer team plays soccer zero, one, or two days a week. The probability that they play zero days is 0.2, the probability that they play one day is 0.5, and the probability that they play two days is 0.3. Find the long-term average or expected value, μ, of the number of days per week the men’s soccer team plays soccer.
To do the problem, first let the random variable X represent the number of days the men’s soccer team plays soccer per week. X takes on the values 0, 1, 2. Construct a PDF table, adding a column x*P(x). In this column, you will multiply each x value by its probability. This table is called an expected value table. The table helps you calculate the expected value, or long-term average.
x | P(x) | x*P(x) |
---|---|---|
0 | 0.2 | (0)(0.2) = 0 |
1 | 0.5 | (1)(0.5) = 0.5 |
2 | 0.3 | (2)(0.3) = 0.6 |
Figure 4.7: Expected value table
What is the expected value?
Solution
Add the last column x*P(x) to find the long term average or expected value.
(0)(0.2) + (1)(0.5) + (2)(0.3) = 0 + 0.5 + 0.6 = 1.1
The expected value is 1.1. The men’s soccer team would, on average, expect to play soccer 1.1 days per week. The number 1.1 is the long-term average or expected value if the men’s soccer team plays soccer week after week after week. We say μ = 1.1.
Your Turn!
A hospital researcher is interested in the number of times the average post-op patient will ring the nurse during a 12-hour shift. For a random sample of 50 patients, the following information was obtained. What is the expected value?
x | P(x) |
---|---|
0 | |
1 | |
2 | |
3 | |
4 | |
5 |
Figure 4.8: Post-op patients
The Variance and Standard Deviation of a Discrete Random Variable
Like data, probability distributions have standard deviations. To calculate the standard deviation (σ) of a probability distribution, find each deviation from its expected value, square it, multiply it by its probability, add the products, and take the square root.
Finding the variance (σ² or V[X]) and standard deviation (σ or SD[X]) of a random variable starts similarly to finding these measures for a data sample, which we have seen before. However, the process differs at its fourth step and looks more like a probability weighted average of the squared deviations similar to the method used to calculate an expected value:
- Find the mean.
- Subtract the mean from each value of x to get your deviations.
- Square each deviation.
- Multiply each squared deviation by its probability, P(x).
- Sum each of the products.
At this point, we now have the variance and can take the square root of the variance to get our standard deviation. The formula looks like this:
Example
Find the expected value of the number of times a newborn baby’s crying wakes its mother after midnight. Calculate the standard deviation of the variable as well.
x | P(x) | x*P(x) | (x - μ)^2 ⋅ P(x) |
---|---|---|---|
0 | |||
1 | |||
2 | |||
3 | |||
4 | |||
5 |
Figure 4.9: Newborn baby crying
You expect a newborn to wake its mother after midnight 2.1 times per week on the average.
Add the values in the third column of the table to find the expected value of X.
Solution
μ = expected value = = 2.1
Use μ to complete the table. The fourth column of this table will provide the values you need to calculate the standard deviation. For each value x, multiply the square of its deviation by its probability. Each deviation has the format x – μ.
Add the values in the fourth column of the table.
Solution
0.1764 + 0.2662 + 0.0046 + 0.1458 + 0.2888 + 0.1682 = 1.05
The standard deviation of X is the square root of this sum.
Solution
σ = ≈ 1.0247
The mean, μ, of a discrete probability function is the expected value.
Solution
μ = Σ(𝑥∙𝑃(𝑥))
The standard deviation, σ, of the PDF is the square root of the variance.
Solution
σ =
When all outcomes in the probability distribution are equally likely, these formulas coincide with the mean and standard deviation of the set of possible outcomes.
Your Turn!
On May 11, 2013, at 9:30 PM, the probability that moderate seismic activity (one moderate earthquake) would occur in the next 48 hours in Japan was about 1.08%. You bet that a moderate earthquake will occur in Japan during this period. If you win the bet, you win $100. If you lose the bet, you pay $10. If X is the amount of profit from a bet, find the mean and standard deviation of X.
Note on Calculations
For probability distributions, we generally use a calculator or a computer to calculate μ and σ to reduce roundoff error. For many special cases of probability distributions, there are shortcut formulas for calculating μ, σ, and associated probabilities. We will see some of these in the future.
A random variable that takes on a countable amount of values
A function that gives the probability that a discrete random variable is exactly equal to some value (x)
A function that gives the probability that a random variable takes a value less than or equal to x
Mean of a random variable