Glossary
 Alternative hypothesis

A working hypothesis that is contradictory to the null hypothesis
 Anecdotal evidence

Evidence that is based on personal testimony and collected informally
 Association

A relationship between variables
 Bernoulli trial

An experiment with the following characteristics:
 There are only two possible outcomes called “success” and “failure” for each trial
 The probability (p) of a success is the same for any trial (so the probability q = 1 − p of a failure is the same for any trial)  Bimodal distribution

A distribution that has 2 modes
 Binomial distribution

A random variable that counts the number of successes in a fixed number (n) of independent Bernoulli trials each with probability of a success (p)
 Bivariate data

Data consisting of two variables, often in search of an association
 Blinding

Not telling participants which treatment they are receiving
 Block design study

Grouping individuals based on a variable into "blocks" and then randomizing cases within each block to the treatment groups
 Casecontrol study

A study that compares a group that has a certain characteristic to a group that does not, often a retrospective study for rare conditions
 Center

The central tendency or most typical value of a dataset
 Central limit theorem (CLT)

States that if there is a population with mean μ and standard deviation σ and you take sufficiently large random samples from the population, then the distribution of the sample means will be approximately normally distributed
 Class midpoint

Found by adding the lower limit and upper limit, then dividing by 2
 Class width

The difference in consecutive lower class limits
 Cluster sampling

A method of sampling where the population has already sorted itself into groups (clusters), randomly selecting a cluster, and using every individual in the chosen cluster as the sample
 Coefficient of determination

A numerical measure of the percentage or proportion of variation in the dependent variable (y) that can be explained by the independent variable (x)
 Cohort study

Longitudinal study where a group of people (typically having a common factor) are studied and data is collected for a purpose
 Complement

The complement of an event consists of all outcomes in a sample space that are NOT in the event
 Completely randomized study

Dividing participants into treatment groups randomly
 Conditional probability

The likelihood that an event will occur given knowledge of another event
 Confidence interval

An interval built around a point estimate for an unknown population parameter
 Confounding (lurking, conditional) variable

A variable that has an effect on a study even though it is neither an explanatory variable nor a response variable
 Contingency (twoway) table

A table in a matrix format that displays the frequency distribution of different variables
 Continuity correction

When statisticians add or subtract .5 to values to improve approximation
 Continuous random variable

A random variable (RV) whose outcomes are measured as an uncountable, infinite, number of values
 Control group

A group in a randomized experiment that receives no (or an inactive) treatment but is otherwise managed exactly as the other groups
 Controlled (designed) experiment

Type of experiment where variables are manipulated; data is collected in a controlled setting
 Convenience sampling

Selecting individuals that are easily accessible and may result in biased data
 Correlation coefficient

A numerical measure that provides a measure of strength and direction of the linear association between the independent variable x and the dependent variable y
 Critical value

Point that lies on a distribution that acts as a cutoff value for accepting or rejecting the null hypothesis
 Crosssectional study

Data collection on a population at one point in time (often prospective)
 Cumulative distribution function (CDF)

A function that gives the probability that a random variable takes a value less than or equal to x
 Cumulative relative frequency

The sum of the relative frequencies for all values that are less than or equal to the given value
 Data

Actual values (numbers or words) that are collected from the variables of interest
 Data analysis process

Process of collecting, organizing, and analyzing data
 Degrees of freedom

The number of objects in a sample that are free to vary
 Descriptive statistics

Methods of organizing, summarizing, and presenting data
 Designed (controlled) experiment

Data collection where variables are manipulated in a controlled setting
 Difference in means

The difference in the means of two independent populations
 Discrete random variable

A random variable that produces discrete data
 Distribution

The possible values a variable can take on, and how often it does so
 Doubleblind study

The act of blinding both the subjects of an experiment and the researchers who work with the subjects
 Empirical rule

Roughly 68% of values are within 1 standard deviation of the mean, roughly 95% of values are within 2 standard deviations of the mean, and 99.7% of values are within 3 standard deviations of the mean
 Event

A single outcome, or subset of outcomes, of an experiment that you are interested in
 Expected value

Mean of a random variable
 Experimental unit

Any individual or object to be measured
 Explanatory variable

The independent variable in an experiment; the value controlled by researchers
 Extrapolation

The process of predicting outside of the observed x values
 Factors

Variables in an experiment
 Frequency

The number of times a value of the data occurs
 Graphical descriptive methods

Organizing, summarizing, or presenting data visually in graphs, figures, or charts
 Hypothesis testing

A decision making procedure for determining whether sample evidence supports a hypothesis
 Independent

The occurrence of one event has no effect on the probability of the occurrence of another event
 Individuals

The person, animal, item, thing, place, etc. that we collect information about
 Inferential statistics

The facet of statistics dealing with using a sample to generalize (or infer) about the population
 Influential points

Observed data points that do not follow the trend of the rest of the data and have a large influence on the calculation of the regression line
 Intersection (AND)

The shared or common outcomes of two events
 Interval scale level

Quantitative data where the difference or gap between values is meaningful
 Law of large numbers

As the number of trials in a probability experiment increases, the relative frequency of an event approaches the theoretical probability
 Levels

Certain values of variables in an experiment
 Linear regression

A mathematical model of a linear association
 Longitudinal study

Collecting data multiple times on the same individuals, usually at fixed increments, over a period of time
 Lower class limit

The lower end of a bin or class in a frequency table or histogram
 Margin of error (MoE)

How much a point estimate can be expected to differ from the true population value; made up of the standard error multiplied by the critical value
 Matched pairs design

Very similar individuals (or even the same individual) receive two different two treatments (or treatment vs. control) then the difference in results are compared
 Mean (average)

A number that measures the central tendency of the data
 Measures of location

A measure of an observation's standing relative to the rest of the dataset
 Median

The middle number in a sorted list
 Modality

How many peaks or clusters there appear to be in a quantitative distribution
 Mode

The most frequently occurring value
 Mutually exclusive (disjoint)

Two events that cannot happen at the same; they share no common outcomes
 Nominal scale level

Categorical data where the the categories have no natural, intuitive, or obvious order
 Normal (Gaussian) distribution

A commonly used symmetric, unimodal, bellshaped, continuous probability distribution
 Null hypothesis

The claim that is assumed to be true and is tested in a hypothesis test
 Numerical descriptive methods

Numbers that summarize some aspect of a dataset, often calculated
 Observational study

Data collection where no variables are manipulated
 Ordinal scale level

Categorical data where the the categories have a natural or intuitive order
 Outcome

A particular result of an experiment
 Outlier

An observation that stands out from the rest of the data significantly
 Pvalue

The probability that an event will occur, assuming the null hypothesis is true
 Parameter

A number that is used to represent a population characteristic and can only be calculated as the result of a census
 Placebo

An inactive treatment that has no real effect on the explanatory variable
 Point estimate

The value that is calculated from a sample used to estimate an unknown population parameter
 Point estimation

Using sample data to calculate a single statistic as an estimate of an unknown population parameter
 Pooled proportion

Estimate of the common value of p1 and p2
 Population

The whole group of individuals who can be studied to answer a research question
 Population mean

The arithmetic mean, or average of a population
 Population mean difference

The mean of the differences in a matched pairs design
 Population proportion

The number of individuals that have a characteristic we are interested in divided by the total number in the population
 Power

The probability of failing to reject a true hypothesis
 Probability

The study of randomness; a number between zero and one, inclusive, that gives the likelihood that a specific event will occur
 Probability density function (PDF)

A function that defines a continuous random variable, and the likelihood of an outcome
 Probability experiment

A random experiment where the result is not predetermined
 Probability mass function (PMF)

A function that gives the probability that a discrete random variable is exactly equal to some value (x)
 Probability model

A mathematical representation of a random process that lists all possible outcomes and assigns probabilities to each of them
 Prospective study

Collecting information as events unfold
 Qualitative (categorical) data

Data that describes qualities, or puts individuals into categories
 Quantile

Points in a distribution that relate to the rank order of values in that distribution
 Quantitative (numerical) data

Numerical data with a mathematical context
 Quantitative continuous data

Data produced by a variable that takes on an uncountable, infinite, number of values
 Quantitative discrete data

Data produced by a variable that takes on a countable number of values
 Random variable

A representation of a probability model
 Ratio scale level

Quantitative data where the difference or gap between values is meaningful AND has a true 0 value
 Relative frequency

The percentage, proportion, or ratio of the frequency of a value of the data to the total number of outcomes
 Repeated measures

When an individual goes through a single treatment more than once
 Residual (error)

A residual measures the vertical distance between an observation and the predicted point on a regression line
 Response variable

The dependent variable in an experiment; the value that is measured for change at the end of an experiment
 Retrospective study

Collecting or using data after events have taken place
 Robust

Not affected by violations of assumptions such as outliers
 Sample

A subset of the population studied
 Sample mean

The arithmetic mean, or average of a dataset
 Sample proportion

The number of individuals that have a characteristic we are interested in divided by the total number in the sample, often found from categorical data
 Sample space

The set of all possible outcomes of an experiment
 Sampling bias

Bias resulting from all members of the population not being equally likely to be selected
 Sampling distribution

The probability distribution of a statistic at a given sample size
 Sampling variability

The idea that samples from the same population can yield different results
 Shape

What a dataset looks like visually
 Significance level

Probability that a true null hypothesis will be rejected, also known as Type I error and denoted by α
 Simple random sample (SRS)

Each member of the population is equally likely to be chosen for a sample of a given sample size and each sample is equally likely to be chosen
 Slope

Tells us how the dependent variable (y) changes for every one unit increase in the independent (x) variable, on average
 Spread (variation, variability)

The level of variability or dispersion of a dataset; also commonly known as variation/variability
 Standard deviation

The average distance (deviation) of each observation from the mean
 Standard error

The standard deviation of a sampling distribution
 Standard normal distribution (SND)

A normal random variable with a mean of 0 and standard deviation of 1 which zscores follow; denoted N(0, 1)
 Statistic

A number calculated from a sample
 Statistical inference

Using information from a sample to answer a question, or generalize, about a population
 Statistically significant

Finding sufficient evidence that the effect we see is not just due to variability, often from rejecting the null hypothesis
 Stratified sampling

Dividing a population into groups (strata), and then using simple random sampling to identify a proportionate number of individuals from each
 Systematic (probability) sampling

Using some sort of pattern or probability based method for choosing your sample
 Tdistribution

A family of t–distributions, dependent on degrees of freedom, similar to the normal distribution but with more variability built in
 Test statistic

A measure of how far what you observed is from the hypothesized (or claimed) value
 Treatment combinations (interactions)

Combinations of levels of variables in an experiment
 Treatments

Different values or components of the explanatory variable applied in an experiment
 Tree diagram

Diagram that helps calculate and organize the number of possible outcomes of an event or problem
 Type I error

The decision is to reject the null hypothesis when, in fact, the null hypothesis is true
 Type II error

Erroneously rejecting a true null hypothesis, or erroneously failing to reject a false null hypothesis
 Uniform distribution

A probability distribution in which all outcomes are equally likely
 Union (OR)

The set of all outcomes in two (or more) events
 Upper class limit

The upper end of a bin or class in a frequency table or histogram
 Values

Possible observations of the variable
 Variable

A characteristic of interest for each person or object in a population
 Variance

The square of the standard deviation; a computational step along the way to calculating the standard deviation
 Variation (variability, spread)

The level of variability or dispersion of a dataset; also commonly known as 'spread'
 Venn diagram

A diagram that shows all possible relations between a collection of different sets
 yintercept

The value of y when x is 0 in your regression equation
 zscore

A measure of location that tells us how many standard deviations a value is above or below the mean