Skip to main content

Posts

Showing posts from November, 2017

Basic Statistics Lecture #8: Anscombe's Quartet

"There are three kinds of lies: lies, damned lies, and statistics." - Unknown As promised last time , I will be covering Anscombe's Quartet.  It is an idea where people may use statistics to lie about a data set.  It is a series of data sets developed by statistician Francis Anscombe and published in the journal American Statistician in 1973. I'm going to provide you with four sets of data.  Do me a favor and apply what you know of statistical analysis to them.  If the statistical data look weird to you, don't be scared; you may have done the analysis perfectly.  Here's the four sets: I II III IV x y x y x y x y 10 8.04 10 9.14 10 7.46 8 6.58 8 6.95 8 8.14 8 6.77 8 5.76 13 7.58 13 8.74 13 12.74 8 7.71 9 8.81 9 8.77 9 7.11 8 8.84 11 8.33 11 9.26 11 7.81

Basic Statistics Lecture #7: Quantitative, Continuous, and Numerical Data

As promised last time, today I will cover basic calculations of data accumulated from real data.  Please take note that all of the following is for the simple case of one group of data.  For two or more distinct groups of data, the calculations will be similar, but slightly more specific due to the nature of 2+ distinct groups of data.  I will cover that in a later post, which will be labeled as ANOVA.  As a side note, I'm a baseball fan, so I'm going to provide examples from the MLB. This information has the labels for the data of a sample, not the population.  The population is the set of all possible people or objects which falls under the category under study.  If we were studying the 2017 ERA's of pitchers, the population would be all MLB pitchers who have pitched in 2017.  The sample is the subset of the population which we are getting the data points from.  If we want to look at the 8 teams who have made it to the Division Series, then the sample is the pitchers

Basic Statistics Lecture #6: Types of Graphs

As mentioned last time , I am going to cover different types of graphs Pie charts  vs. bar charts: Pie charts are those graphs where the data is represented as slices of a circle.  We've all seen pie graphs, and it is exactly what it says on the tin. Bar graphs are another type of data visualization which most people have seen.  It is where the data is represented as bars, where the x-axis is the independent variable, and the y-axis is the dependent variable. Pie charts have the advantages of displaying categories within one set with respect to one another and being used best with percentages.  They have the disadvantages of needing each category displayed to be read correctly (no truncation) and that all data values must be present.  Bar charts have the advantages of being very flexible (there is no distortions from truncating categories) and they can compare 2 or more data sets.  The disadvantage is that it is not very useful for percentages. Stem-

Basic Statistics Lecture #5: Baye's Theorem

As promised last time , I am going to cover Baye's Theorem. If Tree diagram is the common name for Bayes Theorem.  Recall that conditional probability is given by $P(A \mid B) = \frac{P(A \wedge B)}{P(B)}$.   For tree diagrams, let's say that we have events A, B 1 , B 2 , B 3 , … (the reason we have multiple B's is because they all are within the same family of events) such that the events in the family of B are mutually exclusive and the sum of the probabilities of the events in the family of B are equal to 1. Then we have $$P(B_i \mid A)= \frac{P(B_i)*P(A \mid B_i)}{\sum_{m=1}^{n}[P(B_m)*P(A \mid B_m)]}$$  What this means is reliant on the tree diagram. If we are only looking at the sub-items of A, this is what the tree diagram would look like. If J has a probability of 100%, and P(C) and P(D) are not 0, then when we are trying to find the probability of any of the B's being true given that A is true, we have to set the probability of A to be the entir

Basic Statistics Lecture #4: Intro to Probability and Venn Diagrams

As promised last time , this lecture introduces the concept of Probability Theory.  This is necessary to all statistical calculation.  After all, statistical analysis depends on the concept of probability in order to get to the statistics, especially when we start dealing with regression analysis and analysis of variance (ANOVA). The concept of probability deals with Venn diagrams.  For those of you who don't know, Venn diagrams are those diagrams which lets us easily represent which elements two (or more) sets have in common and which elements they don't have in common. Seems about right. So yes, they can be used to analyze costs.  And the intersection of physical systems.  In fact, they can be utilized to analyze almost any type of system.  I'm currently using venn diagrams to analyze known conspiracies  such as the Tuskegee Syphilis Experiment ,  the NSA spying scandal , and the governments' poisoning of the alcohol supply during Prohibition, all in or

Basic Statistics Lecture #3: Normal, Binomial, and Poisson Distributions

As I have mentioned last time , the uniform continuous distribution is not the only form of continuous distribution in statistics.  As promised, here are the three most common continuous distribution types.  As a side note, all sampling distributions are relative to the algebraic mean. Normal Distribution: I think most people are familiar with the concept of a normal distribution.  If you've ever seen a bell curve, you've seen the normal distribution.  If you've begun from the first lecture of this lecture series, you've also seen the normal distribution. This type of distribution is where the data points follow a continuous curve, is non-uniform, has a mean (algebraic average) equal to the median (the exact middle value), falls from highest probability at the mean to (for all practical purposes) zero as the x-values approach $\pm \infty$, and therefor has equal number of data points to the left and to the right of the mean, and has the domain of $(\pm \i

Basic Statistics Lecture #2: An Example of Discrete and Continuous Probabilities

As promised last time , I will cover examples of discrete and continuous probability distributions. Top: Discrete Probability Middle: Continuous Probability Bottom: Disjointed Probability Disjointed won't be covered in this course. Discrete Probability Model (example): The following table represents the probabilities of people of certain age groups living alone, living with a spouse, or living with at least one person who is not a spouse. 15-19 20-24 25-34 35-44 Alone 0.001 0.011 0.031 0.030 With Spouse 0.001 0.023 0.155 0.216 With others 0.169 0.132 0.142 0.089 Because there is a finite amount of categories which has a non-zero probability of occurring, this is considered to be a discrete probability model.  Then again, I am assuming that it is a probability model.  Remember from the first lecture that, i

Basic Statistics Lecture #1: Basic Concepts

Probability is the foundation of everything we do in Statistics.  There are two types of probability: discrete and continuous.  Discrete probability is when there are distinct probabilities for different values (represented by bars), while continuous probability is where the probabilities change smoothly with the value (represented by a smooth curve).  This is better described by images.  These are two types of probability models and are therefor treated different. Discrete Probabilities of Dice Rolls. Continuous Probability Distribution of Pounds of Waste Produced. We have what is called Probability Models.  Probability models have a sample space (denoted by an S).  Probabilities are assigned to sample points in sample space.  For the continuous Standard Normal Distribution, the x axis is the sample point and the y axis (denoted P(E)) is the probability axis.  In the graphs above, the values we want to get a probability for is on the horizontal axis (which is typically d