Discrete Probability Distributions
- We've talked about data and how to graph is and summarize it. This is easy since we can
touch data. But how do we deal with populations since we generally can't contact all
members of a population and we don't have data from all members of the population? We deal
with probabilities instead of counts. Instead of counting how many of our sample have
heart attacks we think of the probability a member of the population will have a heart
attack.
- Each possible outcome of a discrete measurement has a probability. For instance, we
could pick an American and record his race (1=black, 2=Hispanic, 3=white, 4=other). Based
on 1999 US Census estimates the probability of getting a result of 1 is 0.121, which we
write as P(X=1) = 0.121. Also, P(X=2) = 0.115,
P(X=3) = 0.719 and P(X=4) = 0.045.
- Discrete probability distribution
- A discrete probability distribution is a graph, table or formula that specifies the
probability associated with each possible outcome or measurement
- All probabilities are at least zero and not more than 1
- The probabilities summed over all possible outcomes sum to one
- Binomial Distribution
- an example of discrete probability distributions
- used to give the probability of, for example, the number of heads you would get if you
flipped a coin a bunch of times.
- n is the number of trials (number of coin flips)
- x is the number of "successes" (number of heads)
- p is the probability of getting a head on any one flip (0.5 if the coin is
fair)
- each trial can only have one of two outcomes ("success" or "failure"
- the trials are independent
- p is constant between trials x
- The last two items in the above list are assumptions we need to make about the
data in order to be able to use the binomial distribution.
- Links to binomial tables are available here. An online
probability calculator is available here.
- Example: A pen manufacture sells its pens in packages of 10. The defect rate of their
pens is 1% (i.e. the probability of a pen being defective is 0.01). What is the
probability that a package you buy doesn't have any defective pens in it?
- n = 10
- p = 0.01
- x = 0
- From the tables, P(X=0) = 0.904
- What is the probability that your package has at most one defect?
P(X<=1) = P(X=0) + P(X=1) = 0.996
- What is the probability that your package has at least one defect?
P(X>=1) = 1 - P(X=1)
= = 1 - 0.904 = 0.096
- In the binomial distribution p is called a parameter.
In real applications the parameter is unknown and is something we want to estimate. For
instance, we might want to estimate the proportion of Americans that approve of the
President's job performance. If we sample 500 people and 300 approve of the President's
job our estimate of the parameter p is 300/500 or 60%. We normally call the
parameter a "population parameter" to accentuate the fact that we are estimating
an attribute of a population, not of the sample.
- The population mean of the binomial distribution is np. The population variance
is np(1-p).
- There are many, many discrete population distributions. Two common ones are:
- the multinomial (like the binomial but for when there are more than two possible
outcomes, like rolling a dice instead of flipping a coin)
- the Poisson (for counts of rare events, like accidents at an intersection)
- we will only work with the binomial distribution in STAT 110
|
|