1.1 Introduction to Statistics
Learning Objectives
By the end of this chapter, the student should be able to:
- Recognize and differentiate between key terms dealing with statistics
- Identify different types of data
- Identify data collection methods and study designs
- Apply various types of sampling methods to data collection
Introduction
We encounter statistics in our daily lives more often than we probably realize in many different contexts, such as news and weather reports or in lab and classroom settings.
“Statistics’ ultimate goal is translating data into knowledge.” – Alan Agresti & Christine Franklin
You are probably asking yourself, “When and where will I use statistics?” If you read any newspaper, watch television, or use the Internet, you will see statistical information. There are statistics about everything from crime and politics to sports, education, and real estate. Typically, when you read a newspaper article or watch a television news program, you are given sample information. With this information, you may make a decision about the correctness of a statement, claim, or “fact.” Statistical methods can help you make the best educated guess.
Since you will undoubtedly be given statistical information at some point in your life, you need to know some techniques for analyzing the information thoughtfully. Think about buying a house or managing a budget. Your chosen profession may very well involve some statistical knowledge. For example, the fields of economics, business, psychology, education, biology, law, computer science, police science, and early childhood development require at least one course in statistics.
Included in this chapter are the basic ideas and terms of probability and statistics. You will soon understand how statistics and probability work together. You will also learn how data are gathered and how “good” data can be distinguished from “bad.”
The Study of Statistics
We see and use data in our everyday lives. The science of statistics deals with the collection, analysis, interpretation, and presentation of data. This is reflected in the data analysis process, which we will expand on in the next section.
You will first learn how to organize and summarize data. Organizing, summarizing, and presenting data is the basis of descriptive statistics. Data can be summarized with graphs or with numbers (for example, finding an average). After you have studied probability and probability distributions, you will use formal methods for drawing useful conclusions from data while filtering out the noise. These formal methods are called inferential statistics.
Effective interpretation of data (inference) is based on good procedures for producing data and thoughtful examination. You will encounter a lot of mathematical formulas that seem to require calculations. Keep in mind, however, that the goal of statistics is not to perform numerous calculations using the formulas, but to interpret data to gain an understanding. The calculations can be done using a calculator or a computer. The understanding must come from you. If you can thoroughly grasp the basics of statistics, you can be more confident in the decisions you make in life. Statistical inference uses probability to determine how confident you can be that your conclusions are correct.
Probability
Probability is a mathematical tool used to study randomness. It deals with the chance (the likelihood) of an event occurring. For example, if you toss a fair coin four times, the outcomes may not be two heads and two tails. However, if you toss the same coin 4,000 times, the outcomes will be close to half heads and half tails. The expected theoretical probability of heads in any one toss is or 0.5. Even though the outcomes of a few repetitions are uncertain, a regular pattern emerges when there are many repetitions. After reading about the English statistician Karl Pearson tossing a coin 24,000 times with a result of 12,012 heads, one of the authors tossed a coin 2,000 times, resulting in 996 heads. The fraction is equal to 0.498, which is very close to 0.5, the expected probability.
The theory of probability began with the study of games of chance such as poker. Predictions take the form of probabilities. To predict the likelihood of an earthquake, of rain, or of you getting an A in this course, we use probabilities. Doctors use probability to determine the chance of a medical test incorrectly diagnosing the presence of a disease. A stockbroker uses probability to determine the rate of return on a client’s investments. You might use probability to decide if you should buy a lottery ticket. In your study of statistics, you will utilize the power of mathematics and probability to analyze and interpret your data.
Key Terms
In statistics, we generally want to study a population. You can think of a population as a collection of people or things under study. To study the population, we select a sample. The idea of sampling is to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Parameters are numbers that describe a characteristic of the population.
Since it can take a great deal of resources (time, money, manpower, etc.) to examine an entire population, we often study only a subset of that population. Taking a sample is a very practical technique for accomplishing this. If you wished to compute the overall grade point average at your school, it would make sense to select a sample of students who attend the school. The data collected from the sample would be the students’ grade point averages. In presidential elections, opinion polls take samples of 1,000–2,000 people to represent the views of the entire country’s population. Manufacturers of canned carbonated drinks take samples to determine if a 16-ounce can contains 16 ounces of carbonated drink.
From the information we collect in our sample, we can calculate a statistic. A statistic is a number that represents a property of the sample. For example, if we consider one math class to be a sample of the population of all math classes, then the average number of points earned by students in that one math class at the end of the term is an example of a statistic. The statistic is an estimate of a population parameter. A parameter is a numerical characteristic of the whole population that can be estimated by a statistic. Since we considered all math classes to be the population, then the average number of points earned by each student across all math classes is an example of a parameter.
One of the main concerns in the field of statistics is how accurately a statistic estimates a parameter. The accuracy really depends on how well the sample represents the population. The sample must contain the characteristics of the population in order to be a representative sample. We are interested in both sample statistics and population parameters in inferential statistics. In a later chapter, we will use the sample statistic to test the validity of the established population parameter.
Individuals are the units about which we are collecting information. This could be a person, animal, thing, or place. A variable, usually represented by capital letters such as X or Y, is a specific characteristic or measurement that can be determined for each individual. The values of a variable are the possible observations of the variable. If there are multiple variables collected on an individual, the entire set of variables may be called a case or observational unit.
Data refers to the actual values of the variables of interest. Data may be numbers or words. We’ll dive into data in the next section.
Example
Determine how the key terms apply to the following study. We want to know the average (mean) amount of money first-year college students spend at ABC College on school supplies (excluding books). We randomly survey 100 first-year students at the college. Three of those students spent $150, $200, and $225.
Your Turn!
Determine how the key terms apply to the following study. We want to know the average (mean) amount of money spent on school uniforms each year by families with children at Knoll Academy. We randomly survey 100 families with children in the school. Three of the families spent $65, $75, and $95.
Figure References
Figure 1.1: Markus Winkler (2020). Corona death and new cases stats. Unsplash license. https://unsplash.com/photos/tUEnyweZjEU
Figure Descriptions
Figure 1.1: iPhone displaying COVID-19 statistics sits on a gray desk.
Process of collecting, organizing, and analyzing data
Methods of organizing, summarizing, and presenting data
The facet of statistics dealing with using a sample to generalize (or infer) about the population
The study of randomness; a number between zero and one, inclusive, that gives the likelihood that a specific event will occur
The whole group of individuals who can be studied to answer a research question
A number that is used to represent a population characteristic and can only be calculated as the result of a census
A subset of the population studied
A number calculated from a sample
The person, animal, item, place, etc. about which we collect information
A characteristic of interest for each person or object in a population
Possible observations of the variable
Actual values (numbers or words) that are collected from the variables of interest