9.1 Introduction to Bivariate Data and Scatterplots

Learning Objectives

By the end of this chapter, the student should be able to:

  • Display and describe relationships in bivariate data
  • Describe bivariate data numerically 
  • Understand basic ideas of linear regression
  • Predict future value using your regression line
  • Understand the impact of influential points and outliers in the context of linear regression
  • Apply ideas of inference to linear regression
Man inspecting an engine in an auto shop.
Figure 9.1: Auto Mechanic Salaries. Linear regression and correlation can help you determine if an auto mechanic’s salary is related to his work experience.

Professionals often want to know how two (or more) numeric variables are related. For example, is there a relationship between the grade on the second math exam a student takes and the grade on the final exam? If there is a relationship, what is the relationship and how strong is it?

In another example, your income may be determined by your education, your profession, your years of experience, and your ability. The amount you pay a repair person for labor is often determined by an initial amount plus an hourly fee.

The type of data described in these examples is bivariate data — “bi” for two variables.  In this chapter, you will be studying the “simple linear regression”.  Note that this does not imply that these ideas are “simple” but just that we are working with one independent variable (x) and a linear relationship.  This involves data that fits a line in two dimensions.

Bivariate Data

When we are looking at bivariate data we first need to decide, if possible, does changing one variable seems to lead to a change in the other.  A response variable (also called y, dependent variable, predicted variable) measures or records an outcome of a study. An explanatory variable (also called x, independent variable, predictor variable) explains changes in the response variable.

When considering the relationship between two quantitative variables:

  1. Start with a graph (scatterplot)
  2. Look for an overall pattern and deviations from the pattern
  3. Use numerical descriptions of the data and overall pattern (correlation, coefficient of determination)
  4. Consider a mathematical model (regression)

Scatterplots

Before we take up the discussion of linear regression and correlation, we need to examine a way to display the relation between two variables x and y. The most common and easiest way is a scatter plot.  A scatter plot shows a lot about the relationship between the variables. When you look at a scatterplot, you want to notice the overall pattern and any potential deviations from the pattern.  You can determine the strength of the relationship by looking at the scatter plot and seeing how close the points are together. When looking at a scatterplot you always want to note:

  1. Shape
  2. Trend
  3. Strength

The following scatterplot examples illustrate these concepts.

Six scatterplots showing different patterns. First: positive linear pattern (strong) - shows dots in an almost perfect line from bottom left of graph to top right. Second: linear pattern with one deviation - shows the same pattern as first scatterplot with one outlier in the top left corner. Third: negative linear pattern (strong) - shows dots in an almost perfect line from top left to bottom right of graph. Fourth: negative linear pattern (weak) - shows dots from top left to bottom right of graph nowhere near a perfect line, but not completely random. Fifth: exponential growth pattern - shows a few dots on the x axis from left to right in a horizontal line and then gradually the dots move upwards towards the top right corner creating an upwards curve. Sixth: no pattern - random dots all over the graph.
Figure 9.2: Scatterplot Configurations

 Shape

Although we may see other shapes in a scatter plot, at this point we are only interested in applying these ideas when we see a linear pattern. Linear patterns are quite common. The linear relationship is strong if the points are close to a straight line, except in the case of a horizontal line where there is no relationship. If we think that the points show a linear relationship, we would like to draw a line on the scatter plot. This line can will later be calculated through a process called linear regression. However, we only calculate a regression line if one of the variables helps to explain or predict the other variable.

Trend

If we do see a linear pattern, what sort of relationship is there?  A positive trend  is seen when increasing x also increases y.  On the other hand a negative (inverse) trend is seen when increasing x appears to cause y to decrease. In other words:

  • High values of one variable occurring with high values of the other variable or low values of one variable occurring with low values of the other variable.
  • High values of one variable occurring with low values of the other variable.

Strength

At this point we can think about the strength of a relationship as how tightly do the points on a scatterplot fit the linear pattern.  A stronger relationship has points clustered together closely while in a weaker one, points are more spread out.  The strength of a relationship is not always apparent in a scatterplot but we will see numerical measures of this in the future.

Example

1. Does the scatter plot appear linear? Strong or weak? Positive or negative?

Scatterplot with several points plotted in the first quadrant. The points form a clear pattern, moving upward to the right. The points do not line up , but the overall pattern can be modeled with a line.
Figure 9.3: Scatterplot 1

 

2. Does the scatter plot appear linear? Strong or weak? Positive or negative?

Scatterplot with several points plotted in the first quadrant. The points move downward to the right. The overall pattern can be modeled with a line, but the points are widely scattered.
Figure 9.4: Scatterplot 2

 

3. Does the scatter plot appear linear? Strong or weak? Positive or negative?

Scatter plot with several points plotted all over the first quadrant. There is no pattern.
Figure 9.5: Scatterplot 3

Your turn!

Amelia plays basketball for her high school. She wants to improve to play at the college level. She notices that the number of points she scores in a game goes up in response to the number of hours she practices her jump shot each week. She records the following data:

Figure 9.6: Amelia’s Points
X (hours practicing jump shot) Y (points scored in a game)
5 15
7 22
9 28
10 31
11 33
12 36

Construct a scatter plot and state if what Amelia thinks appears to be true.

Image References

Figure 9.1: Aaron Huber (2018). “Artist at Work.” Public domain. Retrieved from https://unsplash.com/photos/KxeFuXta4SE

Figure 9.2: Kindred Grey via Virginia Tech (2020). “Figure 9.2” CC BY-SA 4.0. Retrieved from https://commons.wikimedia.org/wiki/File:Figure_9.2.png . Adaptation of Figures 12.6, 12.7, and 12.8 from OpenStax Introductory Statistics (2013) (CC BY 4.0). Retrieved from https://openstax.org/books/introductory-statistics/pages/12-2-scatter-plots

Figure 9.3: Kindred Grey via Virginia Tech (2020). “Figure 9.3” CC BY-SA 4.0. Retrieved from https://commons.wikimedia.org/wiki/File:Figure_9.3.png . Adaptation of Figure 12.26 from OpenStax Introductory Statistics (2013) (CC BY 4.0). Retrieved from https://openstax.org/books/introductory-statistics/pages/12-practice

Figure 9.4: Kindred Grey via Virginia Tech (2020). “Figure 9.4” CC BY-SA 4.0. Retrieved from https://commons.wikimedia.org/wiki/File:Figure_9.4.png . Adaptation of Figure 12.27 from OpenStax Introductory Statistics (2013) (CC BY 4.0). Retrieved from https://openstax.org/books/introductory-statistics/pages/12-practice

Figure 9.5: Kindred Grey via Virginia Tech (2020). “Figure 9.5” CC BY-SA 4.0. Retrieved from https://commons.wikimedia.org/wiki/File:Figure_9.5.png . Adaptation of Figure 12.28 from OpenStax Introductory Statistics (2013) (CC BY 4.0). Retrieved from https://openstax.org/books/introductory-statistics/pages/12-practice

definition

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Significant Statistics Copyright © 2020 by John Morgan Russell, OpenStaxCollege, OpenIntro is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book