Sampling Distributions
- What if you took a random sample of 100 Americans and recorded their blood pressure
level? We could calculate the sample mean, sample median, sample variance and many other
statistics from that sample. If we took repeated random samples from that population would
we always get the same values for those statistics? No. This is why we say that
sample statistics are random variables. We get different values from sample to sample.
- Any random variable has a probability distribution (a pdf). The pdf of a sample
statistic is called a sampling distribution.
- The mean of a sampling distribution of the mean is labeled
and the
variance of the sampling pdf of the mean is labeled . The population standard error
is labeled .
- The standard error is a standard deviation. It is a special case. Standard deviations
are a measure of spread of any distribution. Standard errors are the same measure of
spread but of a sampling distribution.
- The standard error is a measure of variability of the mean, the standard
deviation is a measure of variability of the population. Understand this distinction!
- Label the mean of the "parent population" (the population you are
sampling from, all Americans for example) as
and variance as . Regardless of the population you are
sampling from, if you take a random sample then:
= 
= 
- Do you expect the mean calculated from a large sample to be more or less variable than
the mean calculated from a small sample? Which is more reliable?
- The standard error is a measure of reliability of sample means.
- So we know the mean and variance of a sampling distribution (at least if we know the
mean and variance of the parent population), but we don't know the pdf of the sample
population. The sample pdf depends on the parent pdf.
- If the parent distribution is N(
, ) the the sample mean is always N( , ) as long as samples are randomly
selected.
- What if we don't know the pdf of the parent population? The Central Limit Theorem
gives us some useful information:
Central Limit Theorem: For large enough n, is approximately normally distributed with mean
and standard
deviation
when the sample was collected randomly from any population with mean and standard deviation 
- How large is "large enough". The book says n=30 is large. That's a lame answer
given by many introductory statistics text books. The truth is that if the parent
population is symmetric then the normal distribution can be a good approximation for the
sampling distribution for about any sample size. For a highly skewed parent population the
normal might never be a good approximation for any value of n you are likely to encounter.
There are no easy answers here.
You can look at histograms of the data to assess how symmetric or skewed the population is
likely to be.
For the purposes of assignments, quizzes or exams I'll just tell you when to consider the
sample size "large".
- So, as long as the samples are randomly selected, the sampling distribution of
is the normal
distribution if:
- the parent population is normally distributed, or
- the sample size is "large"
- So, for large sample sizes, or when the parent population is normally distributed,
~ N( , ). So,
we can use to find probabilities using our standard normal probability tables. We
use this formula for z when we're finding probabilities about a sample mean. We
use the original formula of z ( ) when we're finding a probability about x.
- Say we're sampling a population with population mean
= 10, population variance = 100
with a sample size of n = 25. What is the P( > 12)?
We'll assume that the parent population was normally distributed so that we can assume ~ N( , ).
This problem is solved just like every other normal probability problem. Convert to
z-scores and then use the standard normal tables I passed out in class to find the
probability. Since this is a question about we use the formula .
P( > 12) =
P(Z > (12-10)/(10/5)) =
P(Z > 1) =
0.1587
- Bias: a sample statistic is unbiased for a parameter if the mean of its
sampling distribution equals that parameter.
- The sample mean is unbiased for the population mean
if = .
- Being unbiased does not mean that
= . It does mean that if you calculated for each of many,
many samples that the mean of all those means would equal . It doesn't mean that will be equal to
in any one
sample but it does mean that on average will be close to .
- Similarly, the sample variance is unbiased for the population variance if
= .
- No matter what the pdf of the parent population is (binomial, normal or any other pdf),
the sample mean is unbiased for the population mean and the sample variance is unbiased
for the population variance if the samples are random.
- Remember that we wondered why the denominator of the formula for the sample variance was
(n-1) instead of n? We divide by (n-1) so that
will be unbiased for .
- The sample median is unbiased for the mean only if the parent pdf is symmetric.
- What makes a good estimator of a parameter? What is a better estimator of the population
mean, the sample mean or sample median? What if the parent population has a symmetric pdf?
What is the parent population has a non-symmetric pdf?
|
|