Skip to main content

Basics of Statistics Lecture #10: Hypothesis Testing of Proportions

As promised last time, I am going to cover hypothesis testing of proportions.  This is conceptually similar to hypothesis testing from the mean and standard deviation, but the calculations are going to be different.

A proportion is the percentage of success in a sample or population, reported in decimal form.  This means that if heads comes up 50% of the time, then it is reported as p=0.50.  Because of this, the calculations are different than that of the mean and standard deviation case from last time.  For instance, when we have proportion data, we don't know the standard deviation of either the sample or the population.  This means that the standard error calculation cannot be performed as typical.  We must instead use a proportion-specific version, which is given by $\sigma_{E}=\sqrt{\frac{p*(1-p)}{n}}$, where p is the proportion and n is the sample size.  If we have the sample size and the number of successes, we can calculate the proportion by $p=\frac{n_s}{n}$, where ns is the number of successes. This will be no greater than 1 (since the number of success within the sample cannot exceed the sample size) and no less than 0 (since the number of successes cannot be less than zero).

Now that we have our basic terms for proportions, we can go through hypothesis testing of proportions.  The steps are the same as before.

Step 1: State your hypotheses

As with the mean and standard deviation case, we need to state the null and alternate hypotheses.  The null hypothesis is where the equals sine is present, or the proportions being equal is present; This means one of three things: $\widehat{p}=p_0$, $\widehat{p} \le p_0$, or $\widehat{p} \ge p_0$, where p0 is the sample proportion and p̂ is the population or promised proportion. Notice that in the first case is the pure equality, the second case has the equality included as the maximum value, and the third case having the equality included as the minimum value; the equality is involved in all three cases.  The alternate hypothesis is where the equality is not involved.  Remember that there are no numbers put it the general statement of hypothesis; only the general statements of equality or inequality are included at this step.

Step 2: Setting the Significance Level α

This is the same for proportions as it is for the mean and standard deviation case.  The significance is going to be the opposite of the confidence level which you want to state the conclusions, or the level of uncertainty in the hypothesis test.  The closer to zero, the less the uncertainty of the conclusion of the hypothesis test.

Step 3: Determining the Critical Value

This is the step where we input the specific quantities of the expected proportion and the sample proportion.  Like the mean and standard deviation case, this is going to be the hypothetical z-value for the confidence level (1-α) of your hypothesis.  This I am going to label ztab for the tabulated z-value for a given percent confidence (percent).  As with the mean and standard deviation case, this represents the tail under consideration.  If the test is for either the "greater than (>)" case or the "less than (<)" case, than it is a one-tail situation, and the total alpha value is encompassed under one "tail", or one extreme of the data set.  If, however, the test is simply for the "not equal (≠)" case, then it is a two-tail test, and the value for alpha is split between two tails.
A one tail test, where the 0.0250 is the value for α.
A two tailed test, where the α=0.0250 is split evenly between the two tails.

Step 4: Calculate the Sample Z-Value

This is the same concept as step 4 from the mean and standard deviation case, but since we have neither the mean nor the standard deviation, we cannot use the calculation from last time to find the z-value.  What to do?  Since we are dealing with proportions, there is an equivalent.  In this case, it would be the ratio of the difference between the proportions and the standard error of the sample proportion, $z_{calc}=\frac{p_{0}-\widehat{p}}{\sigma_{E}}$.  

Step 5: Compare the Two Z-Values

Now we have the critical z-value from Step 3 and the calculated z-value from Step 4.  Now we just need to compare them in order to determine if we can reject the null hypothesis.  If the calculated z-value from Step 4 is greater than the critical z-value from step 3 $(z_{calc}>z_{tab})$, then the null hypothesis is rejected and the alternative hypothesis is accepted as true.  Otherwise, if the calculated z-value is less than the tabulated critical z-value $(z_{calc}<z_{tab})$, we fail to reject the null hypothesis, and the null is accepted as true.  Why is this so?  It stems from...

Alternate Step 5: Obtain P-value From Calculated Z-Value

In order to obtain the p-value from the z-value, we must look to the z-table as we have before.  For any z-value of the form x.yx, where x, y, and z are single-digit integers, then we look for x.y on the far left column and look for 0.0z on the very top row, as we have last time.  The decimal notation is the p-value of the difference between proportions.  If the P-value is less than α, than the sample proportion statistically cannot be equal to the proposed proportion, because it is completely contained withing the tail probability.

P-value less than α, or $(z_{calc}>z_{tab})$


That is it for proportion hypothesis testing.  If you have any questions, please leave them in the comments.  Next time, I will be covering confidence intervals.  Until then, stay curious.

K. "Alan" Eister has his bacholers of Science in Chemistry. He is also a tutor for Varsity Tutors.  If you feel you need any further help on any of the topics covered, you can sign up for tutoring sessions here.

Comments

Popular posts from this blog

Basic Statistics Lecture #3: Normal, Binomial, and Poisson Distributions

As I have mentioned last time , the uniform continuous distribution is not the only form of continuous distribution in statistics.  As promised, here are the three most common continuous distribution types.  As a side note, all sampling distributions are relative to the algebraic mean. Normal Distribution: I think most people are familiar with the concept of a normal distribution.  If you've ever seen a bell curve, you've seen the normal distribution.  If you've begun from the first lecture of this lecture series, you've also seen the normal distribution. This type of distribution is where the data points follow a continuous curve, is non-uniform, has a mean (algebraic average) equal to the median (the exact middle value), falls from highest probability at the mean to (for all practical purposes) zero as the x-values approach $\pm \infty$, and therefor has equal number of data points to the left and to the right of the mean, and has the domain of $(\pm \i

Confidence Interval: Basic Statistics Lecture Series Lecture #11

You'll remember last time , I covered hypothesis testing of proportions and the time before that , hypothesis testing of a sample with a mean and standard deviation.  This time, I'll cover the concept of confidence intervals. Confidence intervals are of the form μ 1-α ∈ (a, b) 1-α , where a and b are two numbers such that a<b, α is the significance level as covered in hypothesis testing, and μ is the actual population mean (not the sample mean). This is a the statement of there being a [(1-α)*100]% probability that the true population mean will be somewhere between a and b.  The obvious question is "How do we find a and b?".  Here, I will describe the process. Step 1. Find the Fundamental Statistics The first thing we need to find the fundamental statistics , the mean, standard deviation, and the sample size.  The sample mean is typically referred to as the point estimate by most statistics text books.  This is because the point estimate of the populati

Basic Statistics Lecture #5: Baye's Theorem

As promised last time , I am going to cover Baye's Theorem. If Tree diagram is the common name for Bayes Theorem.  Recall that conditional probability is given by $P(A \mid B) = \frac{P(A \wedge B)}{P(B)}$.   For tree diagrams, let's say that we have events A, B 1 , B 2 , B 3 , … (the reason we have multiple B's is because they all are within the same family of events) such that the events in the family of B are mutually exclusive and the sum of the probabilities of the events in the family of B are equal to 1. Then we have $$P(B_i \mid A)= \frac{P(B_i)*P(A \mid B_i)}{\sum_{m=1}^{n}[P(B_m)*P(A \mid B_m)]}$$  What this means is reliant on the tree diagram. If we are only looking at the sub-items of A, this is what the tree diagram would look like. If J has a probability of 100%, and P(C) and P(D) are not 0, then when we are trying to find the probability of any of the B's being true given that A is true, we have to set the probability of A to be the entir