Skip to main content

Basic Statistics Lecture #2: An Example of Discrete and Continuous Probabilities

As promised last time, I will cover examples of discrete and continuous probability distributions.

Top: Discrete Probability
Middle: Continuous Probability
Bottom: Disjointed Probability
Disjointed won't be covered in this course.

Discrete Probability Model (example):
The following table represents the probabilities of people of certain age groups living alone, living with a spouse, or living with at least one person who is not a spouse.


15-19
20-24
25-34
35-44
Alone
0.001
0.011
0.031
0.030
With Spouse
0.001
0.023
0.155
0.216
With others
0.169
0.132
0.142
0.089

Because there is a finite amount of categories which has a non-zero probability of occurring, this is considered to be a discrete probability model.  Then again, I am assuming that it is a probability model.  Remember from the first lecture that, in order for a model to be a true probability model or statistical model, all of the probabilities must add to exactly 1 (also called "unity", because 100% is the entire set) and each individual probability must be no less than zero and no more than 1.  The latter condition is easy to check.  Since all of these numbers are positive, none of them are less than zero.  Since all of them have a zero in the 1's place in the number, all of them have a probability less than one.

As far as the first criterion, it's still easy, but it's pretty tedious.  Just add all of the numbers together.  Putting all of the numbers through a calculator yields a value of one.  Since it all equals 1, it does meet the first criterion for it being a probability model.

Now if we wanted to find the probability of living alone, regardless of age, we would just add all of the probabilities in the "Alone" row, or 0.001+0.011+0.031+0.030=0.073.  To convert this to a percent is merely a matter of multiplying by 100, or 0.073*100%=7.3%.  There is a 7.3% probability of an individual between the ages of 15 and 44 of living alone, regardless of which age range in which that person belongs.  The same concept applies to finding the total probability of living with a spouse, and living with non-spouse people.

For an E of a person being 25-34 OR a person living alone OR a person living with “others”, find P(E).  This is given by 0.001+0.011+0.031+0.030+0.169+0.132+0.142+0.089+0.155=0.760.  Each entry is only counted once, and since 0.031 was counted under the category of living alone AND 0.142 was counted under living with others, neither is counted under the category of the age range of 25-34.

As a side note, pay careful attention to the conditional words “AND” and “OR”; they have drastically different meanings in statistics than they do in common language.  The statement “Someone can be male OR female” has a different meaning than the statement “Someone can be male AND female.”  The latter is a subset of the prior.  The statement “Someone can be male OR female” means either a male or a female or both.  The statement “Someone can be male AND female” means only both.

Continuous Probability Model (example):
The bell curve is also called a standard normal distribution.  The first rule of probability models is given as stated before, but the second one is given as “the area under the curve over the probability range”.  

Uniform continuous distribution: 
The time it takes to wait for a bus is random with respect to the uniform continuous distribution model.  (Continuous means that there is a graph – and therefor an equation – associated with the probability distribution.  The term Uniform means that the probability distribution is constant.)

A general uniform continuous distribution.
Since it is uniform continuous distribution, it is represented by a straight vertical bar which has a width given by $\frac{1}{x_h-x_l}$, or take the highest time, subtract from it the lowest time, and take the inverse of the answer (one over the difference in wait times).  

The probability of waiting for the bus at any given time interval is given by the product of the base and height, in this case that you somehow magically know that the bus will come at any time in the next 5 minutes, that means the height is $\frac{1}{5-0}=0.20 min^{-1}$.  The probability of the bus showing up sometime between 2 and 2.5 minutes, for a width of 0.5 minutes.  The probability in this case is the width times height, $P[2,2.5]=0.5 min^{-1} * 0.2 min=0.1$, or 10%.

There's an example for each of the two types of data sets, discrete and continuous distributions.  If you have any questions, please leave it in the comments and I'll do my best to answer in a timely manner.  Next time, I'll be covering other types of continuous distributions, namely the normal distribution, chi-squared distribution, the binomial distribution, and the Poisson distribution.  Until then, stay curious.

K. "Alan" Eister has his bacholers of Science in Chemistry. He is also a tutor for Varsity Tutors.  If you feel you need any further help on any of the topics covered, you can sign up for tutoring sessions here.

Comments

Popular posts from this blog

Basic Statistics Lecture #3: Normal, Binomial, and Poisson Distributions

As I have mentioned last time , the uniform continuous distribution is not the only form of continuous distribution in statistics.  As promised, here are the three most common continuous distribution types.  As a side note, all sampling distributions are relative to the algebraic mean. Normal Distribution: I think most people are familiar with the concept of a normal distribution.  If you've ever seen a bell curve, you've seen the normal distribution.  If you've begun from the first lecture of this lecture series, you've also seen the normal distribution. This type of distribution is where the data points follow a continuous curve, is non-uniform, has a mean (algebraic average) equal to the median (the exact middle value), falls from highest probability at the mean to (for all practical purposes) zero as the x-values approach $\pm \infty$, and therefor has equal number of data points to the left and to the right of the mean, and has the domain of $(\pm \i

Confidence Interval: Basic Statistics Lecture Series Lecture #11

You'll remember last time , I covered hypothesis testing of proportions and the time before that , hypothesis testing of a sample with a mean and standard deviation.  This time, I'll cover the concept of confidence intervals. Confidence intervals are of the form μ 1-α ∈ (a, b) 1-α , where a and b are two numbers such that a<b, α is the significance level as covered in hypothesis testing, and μ is the actual population mean (not the sample mean). This is a the statement of there being a [(1-α)*100]% probability that the true population mean will be somewhere between a and b.  The obvious question is "How do we find a and b?".  Here, I will describe the process. Step 1. Find the Fundamental Statistics The first thing we need to find the fundamental statistics , the mean, standard deviation, and the sample size.  The sample mean is typically referred to as the point estimate by most statistics text books.  This is because the point estimate of the populati

Basic Statistics Lecture #5: Baye's Theorem

As promised last time , I am going to cover Baye's Theorem. If Tree diagram is the common name for Bayes Theorem.  Recall that conditional probability is given by $P(A \mid B) = \frac{P(A \wedge B)}{P(B)}$.   For tree diagrams, let's say that we have events A, B 1 , B 2 , B 3 , … (the reason we have multiple B's is because they all are within the same family of events) such that the events in the family of B are mutually exclusive and the sum of the probabilities of the events in the family of B are equal to 1. Then we have $$P(B_i \mid A)= \frac{P(B_i)*P(A \mid B_i)}{\sum_{m=1}^{n}[P(B_m)*P(A \mid B_m)]}$$  What this means is reliant on the tree diagram. If we are only looking at the sub-items of A, this is what the tree diagram would look like. If J has a probability of 100%, and P(C) and P(D) are not 0, then when we are trying to find the probability of any of the B's being true given that A is true, we have to set the probability of A to be the entir