Chapter 2 Wrap Up

Concept Check

Section Reviews

2.1 Introduction to Descriptive Statistics and Frequency Tables

Descriptive statistics are ways of organizing summarizing and presenting data.  There are two main types: visual and numerical.  Usually we want to first examine a dataset visually then describe it numerically.  Appropriate methods often depend on the type of data you are working with, however frequency tables are a quick easy way to organize any type of data.

2.2 Displaying and Describing Categorical Data

Two basic visual methods we have for displaying categorical statistics are:

  • Pie charts
  • Bar charts

When describing a categorical distribution we want to note:

  • Mode
  • Level of variability (diversity)

2.3 Displaying Quantitative Data

The following are common methods of displaying quantitative data

  • Stem-and-leaf plots
  • Dot plots
  • Line graphs
  • Histograms
  • Frequency polygons
  • Time series plots

Some work better to show certain aspects, or for different sample sizes than others.

2.4 Describing Quantitative Distributions

When describing a quantitative distribution we want to at least note 4 things: the shape of the distribution, the presence of outliers, the center, and the spread.  A helpful acronym to remember this is SOCS:

  • Shape –  Can be identified visually, want to note symmetry or lack thereof (skewness) and modality
  • Outliers – Extreme outliers can be seen visually
  • Center – Central tendency can be estimated visually
  • Spread – Dispersion can be estimated visually and roughly quantified with the range

2.5 Measures of Location and Outliers

The values that divide a rank-ordered set of data into 100 equal parts are called percentiles. Percentiles are used to compare and interpret data. For example, an observation at the 50th percentile would be greater than 50 percent of the other observations in the set.

\text{i=}\left(\frac{k}{100}\right)\text{(n+1)}

Where:

  • i = the ranking or position of a data value,
  • k = the kth percentile,
  • n = total number of data.

Expression for finding the percentile of a data value:

\left(\frac{x + 0.5y}{n}\right)\text{(100)}

Where:

  • x = the number of values counting from the bottom of the data list up to but not including the data value for which you want to find the percentile,
  • y = the number of data values equal to the data value for which you want to find the percentile,
  • n = total number of data

Quartiles divide data into quarters. The first quartile (Q1) is the 25th percentile, the second quartile (Q2 or median) is 50th percentile, and the third quartile (Q3) is the the 75th percentile.

The interquartile range, or IQR, is the range of the middle 50 percent of the data values. The IQR is found by subtracting Q1 from Q3, and can help determine outliers by using the following fence rules.

  • Upper fence = Q3 + IQR(1.5)
  • Upper fence =Q1IQR(1.5)

Box plots are a type of graph that can help visually organize data. To graph a box plot the following data points must be calculated: the minimum value, the first quartile, the median, the third quartile, and the maximum value. Once the box plot is graphed, you can display and compare distributions of data.

2.6 Measures of Center

The mean and the median can be calculated to help you find the “center” of a data set. The mean may often be the best representation of the center of a dataset, but the median is often more appropriate when a data set contains several outliers or extreme values. The mode will tell you the most frequently occurring datum (or data) in your data set.

The mean of a dataset can can be approximated from a frequency table by:

\(\mu =\frac{\sum fm}{\sum f}\)

Where:

  • f = interval frequencies
  • m = interval midpoints.

2.7 Measures of Spread

The variance and standard deviation are numerical measures of the spread or dispersion of a dataset. There are different equations to use if you are calculating the standard deviation of a sample or of a population. You find the sample and population standard deviations, respectively:

  • s = \sqrt{\frac{{\sum }^{\text{​}}{\left(x-\overline{x}\right)}^{2}}{n-1}}
  • σ = \sqrt{\frac{{\sum }^{\text{​}}{\left(x-\mu \right)}^{2}}{N}}

To find the standard deviation of a frequency table:

{s}_{x}=\sqrt{\frac{\sum f{m}^{2}}{n}-{\overline{x}}^{2}} where \begin{array}{l}{s}_{x}=\text{ sample standard deviation}\\ \overline{x}\text{ = sample mean}\end{array}

Z-scores are a measure of location that puts an observation in units of standard deviations relative to the mean.  We can use these to compare things from different distributions.

Key Terms

 

Try to define the terms below on your own. Scroll over any term to check your response!

2.1 Introduction 

2.2 Displaying and Describing Categorical Data

2.3 Displaying Quantitative Data

2.4 Describing Quantitative Distributions

2.5 Measures of Location and Outliers

2.6 Measures of Center

2.7 Measures of Spread

Extra Practice

definition

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Significant Statistics - beta (extended) version Copyright © 2020 by John Morgan Russell, OpenStaxCollege, OpenIntro is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.