home.gif (1194 bytes)grades.gif (1215 bytes)assignments.gif (1284 bytes)feedback.gif (1254 bytes)discboard.gif (1264 bytes)

syllabus.gif (1124 bytes)terminology.gif (1142 bytes)lectures.gif (1112 bytes)resources.gif (1130 bytes)jmp.gif (1086 bytes)

 

title.gif (3960 bytes)

 

Sampling Distributions

 

  1. What if you took a random sample of 100 Americans and recorded their blood pressure level? We could calculate the sample mean, sample median, sample variance and many other statistics from that sample. If we took repeated random samples from that population would we always get the same values for those statistics?  No. This is why we say that sample statistics are random variables. We get different values from sample to sample.
     
  2. Any random variable has a probability distribution (a pdf). The pdf of a sample statistic is called a sampling distribution.
     
  3. The mean of a sampling distribution of the mean is labeled mu_xbar.gif (907 bytes) and the variance of the sampling pdf of the mean is labeled sigma2_xbar.gif (330 bytes). The population standard error is labeled sigma_xbar.gif (902 bytes).
     
  4. The standard error is a standard deviation. It is a special case. Standard deviations are a measure of spread of any distribution. Standard errors are the same measure of spread but of a sampling distribution.
     
  5. The standard error is a measure of variability of the mean, the standard deviation is a measure of variability of the population. Understand this distinction!
     
  6. Label the mean of the "parent population" (the population you are sampling from, all Americans for example) as mu.gif (877 bytes) and variance as sigma2.gif (905 bytes). Regardless of the population you are sampling from, if you take a random sample then:
     
    1. mu_xbar.gif (907 bytes) = mu.gif (285 bytes)
       
    2. sigma_xbar.gif (902 bytes) = popse.gif (407 bytes)
       
  7. Do you expect the mean calculated from a large sample to be more or less variable than the mean calculated from a small sample? Which is more reliable?
     
  8. The standard error is a measure of reliability of sample means.
     
  9. So we know the mean and variance of a sampling distribution (at least if we know the mean and variance of the parent population), but we don't know the pdf of the sample population. The sample pdf depends on the parent pdf.
     
  10. If the parent distribution is N(mu.gif (285 bytes)sigma2.gif (310 bytes)) the the sample mean is always N(mu.gif (285 bytes)popsampvar.gif (364 bytes)) as long as samples are randomly selected.
     
  11. What if we don't know the pdf of the parent population? The Central Limit Theorem gives us some useful information:
     
    Central Limit Theorem: For large enough n, xbar.gif (869 bytes) is approximately normally distributed with mean mu.gif (877 bytes) and standard deviation popse.gif (407 bytes) when the sample was collected randomly from any population with mean mu.gif (877 bytes) and standard deviation sigma.gif (870 bytes)
     
  12. How large is "large enough". The book says n=30 is large. That's a lame answer given by many introductory statistics text books. The truth is that if the parent population is symmetric then the normal distribution can be a good approximation for the sampling distribution for about any sample size. For a highly skewed parent population the normal might never be a good approximation for any value of n you are likely to encounter. There are no easy answers here.
     
    You can look at histograms of the data to assess how symmetric or skewed the population is likely to be.
     
    For the purposes of assignments, quizzes or exams I'll just tell you when to consider the sample size "large".
     
  13. So, as long as the samples are randomly selected, the sampling distribution of xbar.gif (869 bytes) is the normal distribution if:
     
    1. the parent population is normally distributed, or
    2. the sample size is "large"
       
  14. So, for large sample sizes, or when the parent population is normally distributed, xbar.gif (869 bytes) ~ N(mu.gif (285 bytes)popsampvar.gif (364 bytes)). So, we can use zscore4.gif (504 bytes) to find probabilities using our standard normal probability tables. We use this formula for z when we're finding probabilities about a sample mean. We use the original formula of z (zscore2.gif (464 bytes)) when we're finding a probability about x.
     
  15. Say we're sampling a population with population mean mu.gif (285 bytes) = 10, population variance sigma2.gif (310 bytes) = 100 with a sample size of n = 25. What is the P(xbar.gif (869 bytes) > 12)? We'll assume that the parent population was normally distributed so that we can assume xbar.gif (869 bytes) ~ N(mu.gif (285 bytes)popsampvar.gif (364 bytes)).
     
    This problem is solved just like every other normal probability problem. Convert to z-scores and then use the standard normal tables I passed out in class to find the probability. Since this is a question about xbar.gif (869 bytes) we use the formula zscore4.gif (504 bytes).
     
    P(xbar.gif (869 bytes) > 12) =
    P(Z > (12-10)/(10/5)) =
    P(Z > 1) =
    0.1587
     
  16. Bias: a sample statistic is unbiased for a parameter if the mean of its sampling distribution equals that parameter.
     
    1. The sample mean is unbiased for the population mean mu.gif (877 bytes) if mu_xbar.gif (907 bytes) = mu.gif (285 bytes).
       
    2. Being unbiased does not mean that xbar.gif (869 bytes) = mu.gif (285 bytes). It does mean that if you calculated xbar.gif (869 bytes) for each of many, many samples that the mean of all those means would equal mu.gif (285 bytes). It doesn't mean that xbar.gif (869 bytes) will be equal to   mu.gif (285 bytes) in any one sample but it does mean that on average xbar.gif (869 bytes) will be close to mu.gif (285 bytes).
       
    3. Similarly, the sample variance is unbiased for the population variance if mu_s2.gif (351 bytes) = sigma2.gif (310 bytes).
       
    4. No matter what the pdf of the parent population is (binomial, normal or any other pdf), the sample mean is unbiased for the population mean and the sample variance is unbiased for the population variance if the samples are random.
       
    5. Remember that we wondered why the denominator of the formula for the sample variance was (n-1) instead of n? We divide by (n-1) so that s2.gif (303 bytes) will be unbiased for sigma2.gif (310 bytes).
       
    6. The sample median is unbiased for the mean only if the parent pdf is symmetric.
       
    7. What makes a good estimator of a parameter? What is a better estimator of the population mean, the sample mean or sample median? What if the parent population has a symmetric pdf? What is the parent population has a non-symmetric pdf?
       

 

E-mail Mr. Callahan at stat110@edcallahan.com with questions or comments about this web site or about the class itself.

This page was last modified on October 20, 1999.