home.gif (1194 bytes)grades.gif (1215 bytes)assignments.gif (1284 bytes)feedback.gif (1254 bytes)discboard.gif (1264 bytes)

syllabus.gif (1124 bytes)terminology.gif (1142 bytes)lectures.gif (1112 bytes)resources.gif (1130 bytes)jmp.gif (1086 bytes)

 

title.gif (3960 bytes)

 

Terms and Definitions

Simple statistic

A number that carries some information.

Descriptive Statistics

Development of numerical and graphical summaries of data.

Inferential Statistics

Uses data collected from a sample to make inferences about a population.

Population

A set of subjects of interest

Sample

A subset of a population on which observations are made

Data

The set of numerical information collected on variables of the subjects of a sample

Variable

Characteristics or property of an individual subject

Census

When all members of a population are included in a sample. Data consists of measurements of variables taken on every member of the population.

Reliability

A measure of the uncertainty of a statistical inference

Target population

The population one wants to make inferences about

Sampled population

The population that is actually sampled

Representative sample

A sample that reflects the characteristics of the target population

Sample of Convenience

A sample collected without a statistical design

Quota Sampling

Separate samples of convenience are collected within each strata of a population.  The sample size within a strata is proportional to that strata's prevalence in the population.

Strata

A subdivision of a population

Random Sampling

Each member of the population has the same probability of being included in the sample

Probability Sampling

Random Sampling

Sampling error

error resulting from the sampled population not being the same as the target population

Non-response error

Occurs when those that respond to the survey have different traits than those that don't respond. Results in sampling error.

Reporting error

occurs when respondents do not answer questions honestly or accurately

Volunteer error

occurs when respondents volunteer to participate in a poll or experiment and those respondents are not representative of the target population. Results in sampling error.

Observational study

"treatment" is a trait of the subject

Confounding factors

variables that are correlated with the treatment (or differs between treatment groups) that effect the response variable.

Designed experiment

"treatment" is assigned to the subject

Double blind study

A designed experiment where neither the researcher or the subjects know which subjects have received treatment and which are in the control group.

Control

In a designed experiment the group of untreated subjects that the treated subjects are compared to.

Placebo

A "fake" treatment that simulates the treatment but actually has no effect. Placebos make double blind studies possible. For instance, in a drug trial a sugar pill is used as the placebo for the control group.

Quantitative variable

Measurements recorded on a naturally occurring numerical scale.  Such as height, weight, number of people attending a protest.

Qualitative variable

Measurements that cannot be naturally measured on a numerical scale, they can only be classified into categories. Such as sex (M/F), race (Caucasian, Hispanic, etc), satisfaction level (1-5, for instance).

Continuous variable

Any possible outcome within some interval is possible, for instance height.

Discrete variable

Possible outcomes are countable, such number of accidents at an intersection (1, 2, 3, 4, ...).

Ordinal variable

Qualitative data that can be naturally put into order, such as satisfaction level (1-5) or age category (under 21, 21-30, 31-40, over 40).

Nominal variable

Qualitative data that cannot be naturally put into order, such as sex (M/F) or race (Caucasian, Hispanic, etc).

Class

one of the categories into which a qualitative variable can be classified

Class frequency

number of observations in a dataset falling within a particular class

Class relative frequency

class frequency divided by the number of observations in a dataset

Skewness

Tendency of a distribution or dataset to have one tail longer than the other. A left skewed distribution has a long left tail; a right skewed distribution has a long right tail.

Robustness

A statistic is robust if changing only a few observations in a dataset does not effect that statistic too much. For instance, the median is more robust than the mean.

Percentile

the p'th percentile is the observation for which p% of the data is less or equal to it and (1-p)% of the data is greater than equal to it.

For instance, of 5 is the 25th percentile of a data set, at least 25% of the data is less than or equal to 5 and at least 75% of the data is greater that or equal to 5.

Median

50th percentile

Lower quartile

Subjects below the 25th percentile

Upper quartile

Subjects above the 75th percentile

Inter-quartile range

(Lower quartile, Upper quartile)

Range

max observation - min observation

Random variable

A variable that assumes a numerical value associated with the random outcome of an experiment. Only one outcome allowed per experiment.

Chance

If an experiment is repeated many, many times (infinitely) the chance of a certain outcome is the percent of times that outcome would occur.

Probability

Chance divided by 100.

Independence

Two events are independent if the outcome of one does not effect the outcome of another. For instance, coin flips are independent since the probability of getting a head on any flip does not depend on the outcome of previous flips.

Probability distribution

A graph, table or formula that specifies the probability associated with each possible outcome or measurement.

Population parameter

An attribute of the population probability distribution function, usually the population mean or variance. Generally the value of the population parameter is unknown and we want to estimate it.

Standard Normal Distribution

Normal distribution with mean 0 and sigma 1.

Sample statistic

A statistic calculated from a sample. Generally an estimate of a population parameter (for instance the sample mean is an estimate of the population mean).

Sampling distribution

The pdf of a sample statistic.

Bias

A sample statistic is a biased estimate of a population parameter if the mean of its sampling distribution is not equal to the population parameter. For instance, suppose we want to estimate the variance of a population. If the average value of the sample standard deviation is equal to the population variance then the standard deviation is an unbiased estimate of the population variance. Otherwise it is a biased estimator.

Standard error

The population standard error is the variance of a sampling distribution of the mean. The sample standard error is the estimate of the population standard error and is equal to the standard deviation divided by the square root of n.

Null Hypothesis

The status quo hypothesis. The hypothesis you are trying to disprove.

Alternative Hypothesis

The research hypothesis. The hypothesis you are trying to prove.

p-value

The probability of getting a value more extreme than the test statistic in the direction of the alternative hypothesis if the null hypothesis were actually true. Reject the null hypothesis when the p-value is less than the Type I error rate, alpha.

Type I error rate

The probability of rejecting the null hypothesis when the null hypothesis is true.

Type II error rate

The probability of not rejecting the null hypothesis when the null hypothesis is false.

Power of the test

The probability of rejecting the null hypothesis when the null hypothesis is false.

 

E-mail Mr. Callahan at stat110@edcallahan.com with questions or comments about this web site or about the class itself.

This page was last modified on February 07, 2000.