# Glossary

Alternative hypothesis

A working hypothesis that is contradictory to the null hypothesis

Anecdotal evidence

Evidence that is based on personal testimony and collected informally

Association

A relationship between variables

Bernoulli trial

An experiment with the following characteristics:

- There are only two possible outcomes called “success” and “failure” for each trial
- The probability (p) of a success is the same for any trial (so the probability q = 1 − p of a failure is the same for any trial)

Bimodal distribution

A distribution that has 2 modes

Binomial distribution

A random variable that counts the number of successes in a fixed number (n) of independent Bernoulli trials each with probability of a success (p)

Bivariate data

Data consisting of two variables, often in search of an association

Blinding

Not telling participants which treatment they are receiving

Block design study

Grouping individuals based on a variable into "blocks" and then randomizing cases within each block to the treatment groups

Case-control study

A study that compares a group that has a certain characteristic to a group that does not, often a retrospective study for rare conditions

Center

The central tendency or most typical value of a dataset

Central limit theorem (CLT)

States that if there is a population with mean μ and standard deviation σ and you take sufficiently large random samples from the population, then the distribution of the sample means will be approximately normally distributed

Class midpoint

Found by adding the lower limit and upper limit, then dividing by 2

Class width

The difference in consecutive lower class limits

Cluster sampling

A method of sampling where the population has already sorted itself into groups (clusters), randomly selecting a cluster, and using every individual in the chosen cluster as the sample

Coefficient of determination

A numerical measure of the percentage or proportion of variation in the dependent variable (y) that can be explained by the independent variable (x)

Cohort study

Longitudinal study where a group of people (typically having a common factor) are studied and data is collected for a purpose

Complement

The complement of an event consists of all outcomes in a sample space that are NOT in the event

Completely randomized study

Dividing participants into treatment groups randomly

Conditional probability

The likelihood that an event will occur given knowledge of another event

Confidence interval

An interval built around a point estimate for an unknown population parameter

Confounding (lurking, conditional) variable

A variable that has an effect on a study even though it is neither an explanatory variable nor a response variable

Contingency (two-way) table

A table in a matrix format that displays the frequency distribution of different variables

Continuity correction

When statisticians add or subtract .5 to values to improve approximation

Continuous random variable

A random variable (RV) whose outcomes are measured as an uncountable, infinite, number of values

Control group

A group in a randomized experiment that receives no (or an inactive) treatment but is otherwise managed exactly as the other groups

Controlled (designed) experiment

Type of experiment where variables are manipulated; data is collected in a controlled setting

Convenience sampling

Selecting individuals that are easily accessible and may result in biased data

Correlation coefficient

A numerical measure that provides a measure of strength and direction of the linear association between the independent variable x and the dependent variable y

Critical value

Point that lies on a distribution that acts as a cut-off value for accepting or rejecting the null hypothesis

Cross-sectional study

Data collection on a population at one point in time (often prospective)

Cumulative distribution function (CDF)

A function that gives the probability that a random variable takes a value less than or equal to x

Cumulative relative frequency

The sum of the relative frequencies for all values that are less than or equal to the given value

Data

Actual values (numbers or words) that are collected from the variables of interest

Data analysis process

Process of collecting, organizing, and analyzing data

Degrees of freedom

The number of objects in a sample that are free to vary

Descriptive statistics

Methods of organizing, summarizing, and presenting data

Designed (controlled) experiment

Data collection where variables are manipulated in a controlled setting

Difference in means

The difference in the means of two independent populations

Discrete random variable

A random variable that produces discrete data

Distribution

The possible values a variable can take on, and how often it does so

Double-blind study

The act of blinding both the subjects of an experiment and the researchers who work with the subjects

Empirical rule

Roughly 68% of values are within 1 standard deviation of the mean, roughly 95% of values are within 2 standard deviations of the mean, and 99.7% of values are within 3 standard deviations of the mean

Event

A single outcome, or subset of outcomes, of an experiment that you are interested in

Expected value

Mean of a random variable

Experimental unit

Any individual or object to be measured

Explanatory variable

The independent variable in an experiment; the value controlled by researchers

Extrapolation

The process of predicting outside of the observed x values

Factors

Variables in an experiment

Frequency

The number of times a value of the data occurs

Graphical descriptive methods

Organizing, summarizing, or presenting data visually in graphs, figures, or charts

Hypothesis testing

A decision making procedure for determining whether sample evidence supports a hypothesis

Independent

The occurrence of one event has no effect on the probability of the occurrence of another event

Individuals

The person, animal, item, thing, place, etc. that we collect information about

Inferential statistics

The facet of statistics dealing with using a sample to generalize (or infer) about the population

Influential points

Observed data points that do not follow the trend of the rest of the data and have a large influence on the calculation of the regression line

Intersection (AND)

The shared or common outcomes of two events

Interval scale level

Quantitative data where the difference or gap between values is meaningful

Law of large numbers

As the number of trials in a probability experiment increases, the relative frequency of an event approaches the theoretical probability

Levels

Certain values of variables in an experiment

Linear regression

A mathematical model of a linear association

Longitudinal study

Collecting data multiple times on the same individuals, usually at fixed increments, over a period of time

Lower class limit

The lower end of a bin or class in a frequency table or histogram

Margin of error (MoE)

How much a point estimate can be expected to differ from the true population value; made up of the standard error multiplied by the critical value

Matched pairs design

Very similar individuals (or even the same individual) receive two different two treatments (or treatment vs. control) then the difference in results are compared

Mean (average)

A number that measures the central tendency of the data

Measures of location

A measure of an observation's standing relative to the rest of the dataset

Median

The middle number in a sorted list

Modality

How many peaks or clusters there appear to be in a quantitative distribution

Mode

The most frequently occurring value

Mutually exclusive (disjoint)

Two events that cannot happen at the same; they share no common outcomes

Nominal scale level

Categorical data where the the categories have no natural, intuitive, or obvious order

Normal (Gaussian) distribution

A commonly used symmetric, unimodal, bell-shaped, continuous probability distribution

Null hypothesis

The claim that is assumed to be true and is tested in a hypothesis test

Numerical descriptive methods

Numbers that summarize some aspect of a dataset, often calculated

Observational study

Data collection where no variables are manipulated

Ordinal scale level

Categorical data where the the categories have a natural or intuitive order

Outcome

A particular result of an experiment

Outlier

An observation that stands out from the rest of the data significantly

P-value

The probability that an event will occur, assuming the null hypothesis is true

Parameter

A number that is used to represent a population characteristic and can only be calculated as the result of a census

Placebo

An inactive treatment that has no real effect on the explanatory variable

Point estimate

The value that is calculated from a sample used to estimate an unknown population parameter

Point estimation

Using sample data to calculate a single statistic as an estimate of an unknown population parameter

Pooled proportion

Estimate of the common value of p1 and p2

Population

The whole group of individuals who can be studied to answer a research question

Population mean

The arithmetic mean, or average of a population

Population mean difference

The mean of the differences in a matched pairs design

Population proportion

The number of individuals that have a characteristic we are interested in divided by the total number in the population

Power

The probability of failing to reject a true hypothesis

Probability

The study of randomness; a number between zero and one, inclusive, that gives the likelihood that a specific event will occur

Probability density function (PDF)

A function that defines a continuous random variable, and the likelihood of an outcome

Probability experiment

A random experiment where the result is not predetermined

Probability mass function (PMF)

A function that gives the probability that a discrete random variable is exactly equal to some value (x)

Probability model

A mathematical representation of a random process that lists all possible outcomes and assigns probabilities to each of them

Prospective study

Collecting information as events unfold

Qualitative (categorical) data

Data that describes qualities, or puts individuals into categories

Quantile

Points in a distribution that relate to the rank order of values in that distribution

Quantitative (numerical) data

Numerical data with a mathematical context

Quantitative continuous data

Data produced by a variable that takes on an uncountable, infinite, number of values

Quantitative discrete data

Data produced by a variable that takes on a countable number of values

Random variable

A representation of a probability model

Ratio scale level

Quantitative data where the difference or gap between values is meaningful AND has a true 0 value

Relative frequency

The percentage, proportion, or ratio of the frequency of a value of the data to the total number of outcomes

Repeated measures

When an individual goes through a single treatment more than once

Residual (error)

A residual measures the vertical distance between an observation and the predicted point on a regression line

Response variable

The dependent variable in an experiment; the value that is measured for change at the end of an experiment

Retrospective study

Collecting or using data after events have taken place

Robust

Not affected by violations of assumptions such as outliers

Sample

A subset of the population studied

Sample mean

The arithmetic mean, or average of a dataset

Sample proportion

The number of individuals that have a characteristic we are interested in divided by the total number in the sample, often found from categorical data

Sample space

The set of all possible outcomes of an experiment

Sampling bias

Bias resulting from all members of the population not being equally likely to be selected

Sampling distribution

The probability distribution of a statistic at a given sample size

Sampling variability

The idea that samples from the same population can yield different results

Shape

What a dataset looks like visually

Significance level

Probability that a true null hypothesis will be rejected, also known as Type I error and denoted by α

Simple random sample (SRS)

Each member of the population is equally likely to be chosen for a sample of a given sample size and each sample is equally likely to be chosen

Slope

Tells us how the dependent variable (y) changes for every one unit increase in the independent (x) variable, on average

The level of variability or dispersion of a dataset; also commonly known as variation/variability

Standard deviation

The average distance (deviation) of each observation from the mean

Standard error

The standard deviation of a sampling distribution

Standard normal distribution (SND)

A normal random variable with a mean of 0 and standard deviation of 1 which z-scores follow; denoted N(0, 1)

Statistic

A number calculated from a sample

Statistical inference

Using information from a sample to answer a question, or generalize, about a population

Statistically significant

Finding sufficient evidence that the effect we see is not just due to variability, often from rejecting the null hypothesis

Stratified sampling

Dividing a population into groups (strata), and then using simple random sampling to identify a proportionate number of individuals from each

Systematic (probability) sampling

Using some sort of pattern or probability based method for choosing your sample

T-distribution

A family of t–distributions, dependent on degrees of freedom, similar to the normal distribution but with more variability built in

Test statistic

A measure of how far what you observed is from the hypothesized (or claimed) value

Treatment combinations (interactions)

Combinations of levels of variables in an experiment

Treatments

Different values or components of the explanatory variable applied in an experiment

Tree diagram

Diagram that helps calculate and organize the number of possible outcomes of an event or problem

Type I error

The decision is to reject the null hypothesis when, in fact, the null hypothesis is true

Type II error

Erroneously rejecting a true null hypothesis, or erroneously failing to reject a false null hypothesis

Uniform distribution

A probability distribution in which all outcomes are equally likely

Union (OR)

The set of all outcomes in two (or more) events

Upper class limit

The upper end of a bin or class in a frequency table or histogram

Values

Possible observations of the variable

Variable

A characteristic of interest for each person or object in a population

Variance

The square of the standard deviation; a computational step along the way to calculating the standard deviation

The level of variability or dispersion of a dataset; also commonly known as 'spread'

Venn diagram

A diagram that shows all possible relations between a collection of different sets

y-intercept

The value of y when x is 0 in your regression equation

z-score

A measure of location that tells us how many standard deviations a value is above or below the mean 