As promised last time, I will introduce the concept of hypothesis testing. This is finally getting to the meat and potatoes of statistical analysis.
There are three things that need to be true in order for a hypothesis to be held true:
- The data needs to be sampled randomly. A random sample is exactly what it says on the tin; a group from the population where each individual has an equal chance of being picked.
- We need to know the sample mean and standard deviation.
- One or both of the following:
- Data needs to come from a normal distribution
- There needs to be a large sample size (for a lot of cases, at least 30, but the more the merrier).
There are 5 steps for hypothesis testing from either of two methods. Statisticians typically distinguish between whether or not we know the population standard deviation. Realistically, this means whether we know the standard deviation for the population (everyone) or for the sample (small portion of everyone under study). For practical purposes, the calculations are identical; the difference is whether the standard deviation being used is for the population as a whole or for the sample.
There is a significant difference in calculation between the proportion case and the non-proportion case. Here, I will cover the hypothesis testing of non-proportion means. The first method is comparing the critical value with the calculated z-value, and the second method is to compare the α value with the calculated p-value.
Method 1: Comparing Critical Value with Calculated Z-Value
1. Stating the Hypothesis to be Tested
The first thing to know is that there needs to be two hypotheses to be stated, the null hypothesis H0 and the alternative hypothesis Ha. The hypothesis typically need to be in the form of an equality or inequality. It is often times easier to state these hypotheses in common language first and then convert the language into the lingo of mathematics. For example, if we wanted to test the hypothesis of whether or not the average ERA of pitchers of teams who made it to the Division Series' was less than the ERA of pitcher of teams who didn't, we would state the language of it as I just have ("the average ERA of pitchers of teams who made it to the Division Series' was less than the ERA of pitcher of teams who didn't"). The mathematical lingo of this is much more concise than that of the common language. The alternative hypothesis would be simply Ha: ERAave,DS < ERAave,non-DS while the null hypothesis is the opposite of the alternative, H0: ERAave,DS ≥ ERAave,non-DS. The null hypothesis will pretty much always be the statement which has the equals sign somewhere in it.
2. Selecting a Significance Level
The significance level, usually denoted in the English speaking world by the Greek letter α (alpha), is the confidence level which we wish to have. To put it another way, it is the threshold for the percent of the time the data will cause us to reject the null hypothesis if left to chance. The smaller the significance level, the higher the our confidence that the hypothesis test claim is true. Strictly speaking, the significance level, which is arbitrarily chosen, needs to be between 0 and 1, but in practice, should be no greater than 0.2. The higher this significance level, the more likely the conclusion cannot be taken at face value. Typically, α=0.05 is chosen, since this is a 95% confidence of truth. Also, while we would like to have a significance level of 0, it would not work out numerically. We'll see why in a moment. For example, with the ERA example, I would like to test the hypothesis of H0: ERAave,DS ≥ ERAave,non-DS at a 99.9% certainty, so the significance level I should set the significance level at α=100%-99.9%=0.1%, or as α=0.001 in decimal notation.
3. Find the Critical Value
The critical value of the sample of a population is basically the percent of the time which the null hypothesis will be rejected, given the particular mean, standard deviation, and sample size. The critical value is the z-value from the z-table which corresponds to the significance level. The critical value is merely that value of z which corresponds to the significance level. That is to say that the value of z should be chosen such that α represents the percent of data points on the other side of the z-point from the mean. These z-values can be obtained from the Z-Table, where the probability you would be looking for is 1-α. When dealing with a "one-tail test" (the test where we are only considering either the point is greater than the mean or less than the mean, but not both), the z value corresponds to just the value of alpha. With a two-tailed test (where we are considering both less than and greater than the mean, or simply just the "not equal" case), the critical value corresponds to half of the significance level, or $z_{cv}=\pm \frac{\alpha}{2}$, because the data points in the alpha region is split between the two tails.
Both have a total α value of 0.5, but for calculation, the two tailed test has an α of 0.025 since it has α=0.025 per tail. |
4. Calculate the Sample Z-Value
There is the critical value of z for the population, and there is the sample calculated value for z. These two values will be compared for statistical analysis. The z-value of the sample is the actual z-value of the difference of means. The calculation for the critical number of the data from the sample is $z_{calc}=\frac{\overline{x_1}-\overline{x_2}}{s_E}$, where x1 represents the sample mean or the proposed value, while x2 is the population mean, and sE is the standard error of the sample. The standard error is given by $s_E=\frac{s}{\sqrt{n}}$.5. Compare Z-Critical and Z-Calculated
If the critical value from part 3 is greater than the calculated value from part 4 (if the calculated z value lies within the unshaded area of the bell curves above), than we "fail to reject the null hypothesis", which means that the equality statement is statistically true. To put it another way, the sample mean and the population mean are statistically equivalent to one another. If, however, the critical value from part 3 is less than or equal to the calculated value from part 4 (if the calculated z value is in the shaded region), then the null hypothesis is rejected and the alternate hypothesis is held to be correct. This means that the sample mean and the population mean are not equal to one another.
Method 2: Comparing α with Calculated P-Value
There is another related method of hypothesis testing. This depends on converting the calculated z-value from 4 into a p-value. In order to do this, perform the following steps:
1. State the Null Hypothesis.
As above, this is stating the equality as the null hypothesis and the inequality as the alternate hypothesis. The null hypothesis will always be what you are trying to reject.
2. Selecting a Significance Level.
As above, this is the significance level of the sample mean. This will be the α level which we will use for comparison.
3. Calculate the Sample Z-Value.
This is where we deviate from Method 1. Notice that this is step 4 in Method 1 but step 3 here in method 2. This is because we are not finding the critical value from alpha, but rather going from the z-calculated to a p-value (as described in the next step). For now, calculate the z-calculated as was done in step 4 of Method 1.
4. Obtain the P-Value from Z-Calculated
Here, we will use the z-value from step 3 to find the p-value of the sample. In order to do this, first round the z-values to the second digit to the right of the decimal point. Second, find the 1's spot and 10th's spot in the left hand column of the z-table. Then find the 100th's spot in the absolute top row of the table. The p-value will the the number where the two intersect.
For example, the intersection of -1.96=-1.9+0.06 is p=0.0250. |
5. Make conclusion
For the conclusion, if the p-value is less than or equal to the value of α, then the null hypothesis (the two values are statistically equivalent) is rejected and the alternate hypothesis holds true. On the other hand, if p>α, than we fail to reject the null hypothesis and the two values are statistically equivalent.
And that's the basics of hypothesis testing. If you have any questions, please leave them in the comments. Next time, I am going to cover hypothesis testing of proportions. Until then, stay curious.
K. "Alan" Eister has his bacholers of Science in Chemistry. He is also a tutor for Varsity Tutors. If you feel you need any further help on any of the topics covered, you can sign up for tutoring sessions here.
K. "Alan" Eister has his bacholers of Science in Chemistry. He is also a tutor for Varsity Tutors. If you feel you need any further help on any of the topics covered, you can sign up for tutoring sessions here.
Comments
Post a Comment