Skip to main content

Basic Statistics Lecture #6: Types of Graphs

As mentioned last time, I am going to cover different types of graphs


Pie charts  vs. bar charts:
Pie charts are those graphs where the data is represented as slices of a circle.  We've all seen pie graphs, and it is exactly what it says on the tin.



Bar graphs are another type of data visualization which most people have seen.  It is where the data is represented as bars, where the x-axis is the independent variable, and the y-axis is the dependent variable.


Pie charts have the advantages of displaying categories within one set with respect to one another and being used best with percentages.  They have the disadvantages of needing each category displayed to be read correctly (no truncation) and that all data values must be present.  Bar charts have the advantages of being very flexible (there is no distortions from truncating categories) and they can compare 2 or more data sets.  The disadvantage is that it is not very useful for percentages.



Stem-and-Leaf plot vs. histogram.
Stem and Leaf plots are those graphs which show the non-ones-place as the stem and the ones place as the leaves.  What you do is put the stems on one column, and the leaves of the corresponding data points next to their appropriate stems.  Since there is no way to explain it well without an example, I'll use a picture.

The low is 20 and the high is 102.

Histograms are a special type of bar graphs.  Each bar represent a range of x-values (independent variables) and the height represents the frequency between the two x-values.


For stem-and-leaf plots, the only advantage is that each data point is seen in the graph.  The cons is that it is inflexible and not suitable for a large range of data points.  Histograms are very flexible, good to see the theoretical distribution, and useful for spotting outliers.  The only con is that is cannot see individual data points.  Histograms are used to show frequency of data points within a given interval.

Scatter Plot vs. Line Graph:
A scatter plot puts a dot at each x-y pair, where x is the independent variable and y is the dependent variable.  This is the most common graph type when performing statistical regression analysis.

Line graphs are where there is only one data point corresponding to each x point.  It's similar to a scatter plot, but each point represents the average of y-values.


The advantage to a scatter plot is that it shows all data points and we can see a general trend if one is to be found.  We can also see how strong the trend is.  The disadvantage is that it can get exceedingly cluttered exceedingly quickly, and it is not practical to look at each data point individually.  Line graphs have the advantage of being much easier to look at each individual point readily.  The disadvantage is that it does not display each data point, just the average of y values at a given x value.

These are the common types of non-probability graphs of data used in statistics.  Next time, I'll be covering a more detailed look into data sets.  Until then, stay curious.
K. "Alan" Eister has his bacholers of Science in Chemistry. He is also a tutor for Varsity Tutors.  If you feel you need any further help on any of the topics covered, you can sign up for tutoring sessions here.

Comments

Popular posts from this blog

Basic Statistics Lecture #3: Normal, Binomial, and Poisson Distributions

As I have mentioned last time , the uniform continuous distribution is not the only form of continuous distribution in statistics.  As promised, here are the three most common continuous distribution types.  As a side note, all sampling distributions are relative to the algebraic mean. Normal Distribution: I think most people are familiar with the concept of a normal distribution.  If you've ever seen a bell curve, you've seen the normal distribution.  If you've begun from the first lecture of this lecture series, you've also seen the normal distribution. This type of distribution is where the data points follow a continuous curve, is non-uniform, has a mean (algebraic average) equal to the median (the exact middle value), falls from highest probability at the mean to (for all practical purposes) zero as the x-values approach $\pm \infty$, and therefor has equal number of data points to the left and to the right of the mean, and has the domain of $(\pm \i

Confidence Interval: Basic Statistics Lecture Series Lecture #11

You'll remember last time , I covered hypothesis testing of proportions and the time before that , hypothesis testing of a sample with a mean and standard deviation.  This time, I'll cover the concept of confidence intervals. Confidence intervals are of the form μ 1-α ∈ (a, b) 1-α , where a and b are two numbers such that a<b, α is the significance level as covered in hypothesis testing, and μ is the actual population mean (not the sample mean). This is a the statement of there being a [(1-α)*100]% probability that the true population mean will be somewhere between a and b.  The obvious question is "How do we find a and b?".  Here, I will describe the process. Step 1. Find the Fundamental Statistics The first thing we need to find the fundamental statistics , the mean, standard deviation, and the sample size.  The sample mean is typically referred to as the point estimate by most statistics text books.  This is because the point estimate of the populati

Basic Statistics Lecture #5: Baye's Theorem

As promised last time , I am going to cover Baye's Theorem. If Tree diagram is the common name for Bayes Theorem.  Recall that conditional probability is given by $P(A \mid B) = \frac{P(A \wedge B)}{P(B)}$.   For tree diagrams, let's say that we have events A, B 1 , B 2 , B 3 , … (the reason we have multiple B's is because they all are within the same family of events) such that the events in the family of B are mutually exclusive and the sum of the probabilities of the events in the family of B are equal to 1. Then we have $$P(B_i \mid A)= \frac{P(B_i)*P(A \mid B_i)}{\sum_{m=1}^{n}[P(B_m)*P(A \mid B_m)]}$$  What this means is reliant on the tree diagram. If we are only looking at the sub-items of A, this is what the tree diagram would look like. If J has a probability of 100%, and P(C) and P(D) are not 0, then when we are trying to find the probability of any of the B's being true given that A is true, we have to set the probability of A to be the entir