Chapter 2 Wrap Up
Concept Check
Section Reviews
2.1 Introduction to Descriptive Statistics and Frequency Tables
Descriptive statistics are ways of organizing summarizing and presenting data. There are two main types: visual and numerical. Usually we want to first examine a dataset visually then describe it numerically. Appropriate methods often depend on the type of data you are working with, however frequency tables are a quick easy way to organize any type of data.
2.2 Displaying and Describing Categorical Data
Two basic visual methods we have for displaying categorical statistics are:
- Pie charts
- Bar charts
When describing a categorical distribution we want to note:
- Mode
- Level of variability (diversity)
2.3 Displaying Quantitative Data
The following are common methods of displaying quantitative data
- Stem-and-leaf plots
- Dot plots
- Line graphs
- Histograms
- Frequency polygons
- Time series plots
Some work better to show certain aspects, or for different sample sizes than others.
2.4 Describing Quantitative Distributions
When describing a quantitative distribution we want to at least note 4 things: the shape of the distribution, the presence of outliers, the center, and the spread. A helpful acronym to remember this is SOCS:
- Shape – Can be identified visually, want to note symmetry or lack thereof (skewness) and modality
- Outliers – Extreme outliers can be seen visually
- Center – Central tendency can be estimated visually
- Spread – Dispersion can be estimated visually and roughly quantified with the range
2.5 Measures of Location and Outliers
The values that divide a rank-ordered set of data into 100 equal parts are called percentiles. Percentiles are used to compare and interpret data. For example, an observation at the 50th percentile would be greater than 50 percent of the other observations in the set.
![]()
Where:
- i = the ranking or position of a data value,
- k = the kth percentile,
- n = total number of data.
Expression for finding the percentile of a data value:
![]()
Where:
- x = the number of values counting from the bottom of the data list up to but not including the data value for which you want to find the percentile,
- y = the number of data values equal to the data value for which you want to find the percentile,
- n = total number of data
Quartiles divide data into quarters. The first quartile (Q1) is the 25th percentile, the second quartile (Q2 or median) is 50th percentile, and the third quartile (Q3) is the the 75th percentile.
The interquartile range, or IQR, is the range of the middle 50 percent of the data values. The IQR is found by subtracting Q1 from Q3, and can help determine outliers by using the following fence rules.
- Upper fence = Q3 + IQR(1.5)
- Upper fence =Q1 – IQR(1.5)
Box plots are a type of graph that can help visually organize data. To graph a box plot the following data points must be calculated: the minimum value, the first quartile, the median, the third quartile, and the maximum value. Once the box plot is graphed, you can display and compare distributions of data.
2.6 Measures of Center
The mean and the median can be calculated to help you find the “center” of a data set. The mean may often be the best representation of the center of a dataset, but the median is often more appropriate when a data set contains several outliers or extreme values. The mode will tell you the most frequently occurring datum (or data) in your data set.
The mean of a dataset can can be approximated from a frequency table by:
![]()
Where:
- f = interval frequencies
- m = interval midpoints.
2.7 Measures of Spread
The variance and standard deviation are numerical measures of the spread or dispersion of a dataset. There are different equations to use if you are calculating the standard deviation of a sample or of a population. You find the sample and population standard deviations, respectively:
- s =

- σ =

To find the standard deviation of a frequency table:
where ![]()
Z-scores are a measure of location that puts an observation in units of standard deviations relative to the mean. We can use these to compare things from different distributions.
Key Terms
Try to define the terms below on your own. Scroll over any term to check your response!
2.1 Introduction
- Descriptive statistics
- Graphical descriptive methods
- Numerical descriptive methods
- Distribution
- Frequency
- Relative frequency
- Cumulative relative frequency
- Lower class limit
- Upper class limit
- Class width
- Class midpoint
2.2 Displaying and Describing Categorical Data
2.3 Displaying Quantitative Data
2.4 Describing Quantitative Distributions
2.5 Measures of Location and Outliers
2.6 Measures of Center
2.7 Measures of Spread
- Variation (variability, spread)
- Standard deviation
- Sample
- Population
- Variance
- Population
- Sample
- Z-score
Extra Practice
Methods of organizing, summarizing, and presenting data
Organizing, summarizing, or presenting data visually in graphs, figures, or charts
Numbers that summarize some aspect of a dataset, often calculated
The possible values a variable can take on, and how often it does so
The number of times a value of the data occurs
The percentage, proportion, or ratio of the frequency of a value of the data to the total number of outcomes
The sum of the relative frequencies for all values that are less than or equal to the given value
The lower end of a bin or class in a frequency table or histogram
The upper end of a bin or class in a frequency table or histogram
The difference in consecutive lower class limits
Found by adding the lower limit and upper limit, then dividing by 2
Data that describes qualities, or puts individuals into categories
The most frequently occurring value
The level of variability or dispersion of a dataset; also commonly known as variation/variability
Numerical data with a mathematical context
A random variable that produces discrete data
Categorical data where the the categories have a natural or intuitive order
What a dataset looks like visually
An observation that stands out from the rest of the data significantly
The central tendency or most typical value of a dataset
How many peaks or clusters there appear to be in a quantitative distribution
A number that measures the central tendency of the data
The middle number in a sorted list
The arithmetic mean, or average of a dataset
The arithmetic mean, or average of a population
Not affected by violations of assumptions such as outliers
The average distance (deviation) of each observation from the mean
A subset of the population studied
The whole group of individuals who can be studied to answer a research question
The square of the standard deviation; a computational step along the way to calculating the standard deviation
A measure of location that tells us how many standard deviations a value is above or below the mean