Skip to main content

Basic Statistics Lecture #2: An Example of Discrete and Continuous Probabilities

As promised last time, I will cover examples of discrete and continuous probability distributions.

Top: Discrete Probability
Middle: Continuous Probability
Bottom: Disjointed Probability
Disjointed won't be covered in this course.

Discrete Probability Model (example):
The following table represents the probabilities of people of certain age groups living alone, living with a spouse, or living with at least one person who is not a spouse.


15-19
20-24
25-34
35-44
Alone
0.001
0.011
0.031
0.030
With Spouse
0.001
0.023
0.155
0.216
With others
0.169
0.132
0.142
0.089

Because there is a finite amount of categories which has a non-zero probability of occurring, this is considered to be a discrete probability model.  Then again, I am assuming that it is a probability model.  Remember from the first lecture that, in order for a model to be a true probability model or statistical model, all of the probabilities must add to exactly 1 (also called "unity", because 100% is the entire set) and each individual probability must be no less than zero and no more than 1.  The latter condition is easy to check.  Since all of these numbers are positive, none of them are less than zero.  Since all of them have a zero in the 1's place in the number, all of them have a probability less than one.

As far as the first criterion, it's still easy, but it's pretty tedious.  Just add all of the numbers together.  Putting all of the numbers through a calculator yields a value of one.  Since it all equals 1, it does meet the first criterion for it being a probability model.

Now if we wanted to find the probability of living alone, regardless of age, we would just add all of the probabilities in the "Alone" row, or 0.001+0.011+0.031+0.030=0.073.  To convert this to a percent is merely a matter of multiplying by 100, or 0.073*100%=7.3%.  There is a 7.3% probability of an individual between the ages of 15 and 44 of living alone, regardless of which age range in which that person belongs.  The same concept applies to finding the total probability of living with a spouse, and living with non-spouse people.

For an E of a person being 25-34 OR a person living alone OR a person living with “others”, find P(E).  This is given by 0.001+0.011+0.031+0.030+0.169+0.132+0.142+0.089+0.155=0.760.  Each entry is only counted once, and since 0.031 was counted under the category of living alone AND 0.142 was counted under living with others, neither is counted under the category of the age range of 25-34.

As a side note, pay careful attention to the conditional words “AND” and “OR”; they have drastically different meanings in statistics than they do in common language.  The statement “Someone can be male OR female” has a different meaning than the statement “Someone can be male AND female.”  The latter is a subset of the prior.  The statement “Someone can be male OR female” means either a male or a female or both.  The statement “Someone can be male AND female” means only both.

Continuous Probability Model (example):
The bell curve is also called a standard normal distribution.  The first rule of probability models is given as stated before, but the second one is given as “the area under the curve over the probability range”.  

Uniform continuous distribution: 
The time it takes to wait for a bus is random with respect to the uniform continuous distribution model.  (Continuous means that there is a graph – and therefor an equation – associated with the probability distribution.  The term Uniform means that the probability distribution is constant.)

A general uniform continuous distribution.
Since it is uniform continuous distribution, it is represented by a straight vertical bar which has a width given by $\frac{1}{x_h-x_l}$, or take the highest time, subtract from it the lowest time, and take the inverse of the answer (one over the difference in wait times).  

The probability of waiting for the bus at any given time interval is given by the product of the base and height, in this case that you somehow magically know that the bus will come at any time in the next 5 minutes, that means the height is $\frac{1}{5-0}=0.20 min^{-1}$.  The probability of the bus showing up sometime between 2 and 2.5 minutes, for a width of 0.5 minutes.  The probability in this case is the width times height, $P[2,2.5]=0.5 min^{-1} * 0.2 min=0.1$, or 10%.

There's an example for each of the two types of data sets, discrete and continuous distributions.  If you have any questions, please leave it in the comments and I'll do my best to answer in a timely manner.  Next time, I'll be covering other types of continuous distributions, namely the normal distribution, chi-squared distribution, the binomial distribution, and the Poisson distribution.  Until then, stay curious.

K. "Alan" Eister has his bacholers of Science in Chemistry. He is also a tutor for Varsity Tutors.  If you feel you need any further help on any of the topics covered, you can sign up for tutoring sessions here.

Comments

Popular posts from this blog

Confidence Interval: Basic Statistics Lecture Series Lecture #11

You'll remember last time , I covered hypothesis testing of proportions and the time before that , hypothesis testing of a sample with a mean and standard deviation.  This time, I'll cover the concept of confidence intervals. Confidence intervals are of the form μ 1-α ∈ (a, b) 1-α , where a and b are two numbers such that a<b, α is the significance level as covered in hypothesis testing, and μ is the actual population mean (not the sample mean). This is a the statement of there being a [(1-α)*100]% probability that the true population mean will be somewhere between a and b.  The obvious question is "How do we find a and b?".  Here, I will describe the process. Step 1. Find the Fundamental Statistics The first thing we need to find the fundamental statistics , the mean, standard deviation, and the sample size.  The sample mean is typically referred to as the point estimate by most statistics text books.  This is because the point estimate of the po...

Multiple Regression: Basic Statistics Lecture Series Lecture #14

As promised from last time , I am going to cover multiple regression analysis this time.  As mentioned last time, correlation may not imply causation, causation does imply correlation, so correlation is a necessary but insufficient (but still necessary) first step in determining causation.  Since this is a basic statistic lecture series, there is an assumption that matrix algebra is not known to students who take this course, so this section will only be working with the solutions obtained through a program such as Excel , R , SPSS , or MatLab . This is the regression case where there is more than one independent variable, or multiple independent variables, for a single dependent variable.  For example, I mentioned last time that there is a causal correlation between the number of wins a team in the MLB has and the ratio of the runs that team score to the runs that team allowed.  The more runs they scored per run they allowed, the more wins they are likely to hav...

The Connections Between the Sciences

I apologize for taking so long with this entry of my blog. I have been abnormally busy lately with my academics and poetry. Today, I am writing on how all of the sciences are related to one another, in the hopes that you will come to realize that the sciences are not as separate as popular culture and news has us believe. This blog will be geared to those individuals – weather you're the average person or a student of science, or a full blown scientist – who have the opinion that the different fields of science are completely isolated from one another. This sentiment is not true, and I hope to show the false-hood of this concept here. In physics, we have the concept of “The Right-Hand-Rule”. This pretty much determines whether the a force perpendicular to two vectors is “positive” or “negative”. Torque is a good example of this. The amount of torque placed on, say, a bolt by a crescent wrench is perpendicular to the position vector and the fo...