Part
III: One & Two Sample Parametric Tests |
| | Introduction | Type I Error | Type II Error | Definition of
Power | | Calculating Power (2 Examples) | Influences on Power | |
| When we perform a
significance test, we generally compare the probability
of getting our observed value to some pre set criterion
probability. This criterion is alpha, and is typically
set at .05. More often than not, people typically do not
even think about probability in these situations. Rather,
people typically look at their observed value of a
statistic and compare that to a critical value (where in
most cases one rejects the null if the observed value is
greater than the critical). This is fine, and certainly
by no means incorrect. But thinking of your decision of
whether to reject or fail to reject the null in terms of
probability makes it easier to understand the nature of
power and decision errors. Much as we would like to think we make correct decisions when we either reject or FTR the null, there is no way of knowing for sure. However, it would only make sense that occasionally we will be wrong. Thus, we can only focus on the probability of making such errors occurring - we will never know for sure if we have made an error or not. There are two forms of errors we can estimate in terms of their probabilities, both of which are readily calculable. |
| Type I error (Alpha) is
the probability of rejecting the null hypothesis when the
null is true; it is the probability of rejecting the null
when you should really have failed to reject (FTR); it's
saying there is a significant effect when there truly is
none. For example, say you have a null hypothesis that the mean height for Wayne State men is 68 inches (Ho: u = 68). You go out, get a sample of 50 WSU men, measure each of them, and find that their mean height is 76 inches. You calculate your statistic (say z = 2.98) and compare it to a critical value (z = 1.96). You'd reject the null here (b/c observed is bigger than the critical value), and state that this sample is significantly taller than 68", or that this sample comes from a different population from which the null specifies. But what if this sample happens to be basketball, volleyball, and football players? These men are from the WSU population, right? It just happened that we got a highly atypical sample in terms of height. We rejected the null, but we really should not have. This is TYPE I error. How often will this TYPE I error accur? If alpha is set at .05, we will make TYPE I error 5 times out of a hundred (out of 100 null hypotheses that we reject, 5 will be incorrectly rejected. Thus we have control over TYPE I in that we can set alpha to what ever we want. If you really did not want to make a TYPE I error, you could set alpha really low (.0000001). A critical value associatod with this alpha would be huge, making it virtually impossible to find an observed score greater than this value. If you cannot reject the null, you cannot commit TYPE I ERROR. However, this conservatism can lead to another type of error. |
| Type II error (Beta) is
the probability of failing to reject the null when the
null is not true; it is the probability of failing to
reject the null when you really should; ils the
probability of saying there is no significant effect when
there really is one. For example, say you have a null hypothesis that the mean IQ for the general population is 100 ( u = 100). Say you sample 50 people who happen to be from MENSA, and their mean IQ is 120. You calculate your statistic (z = 1.88), and compare it to a critical value (z = 1.96). Here you would fail to reject the null, and conclude that the MENSA population is not significantly different from the general population. But in actuality, the mean IQ in the population of MENSA members is 140. We just happened to get a sample of "lower" ability MENSA members. We thus made an error - we FTR when we should have rejected. Thus, low alpha levels (making it impossible to reject the null) could lead to more instances where one erroneously FTR the null. How often does TYPE II error occur? We'll get to that in a minute. But for now, remember this: The only way one can make a TYPE II error is to fail to reject; if you reject the null, you cannot make TYPE lI error. |
The diagram below
represents four outcomes of the decisions we make, in
terms of whether or not the null is true, and whether we
reject the null or not.
As you see, FTR the null when the null is true is a correct decision. However, we're usually interested in trying to find true differences, and therefore look to reject null hypotheses. Rejecting the null when it is really not true is a correct decision as well. More specifically, the probability a test has to do this is referred to power. Power may be defined as the probability of correctly rejecting the null hypothesis. In other words, it is the probability of rejecting the null hypothesis given that the null is incorrect. Some people also refer to power as precision or sensitivity. Power is actually a function of TYPE II error, or, Beta. More specifically, Power is equal to 1 - Beta. How these values are calculated is shown in examples below. When you calculate power, you need to identify two distributions - the distribution under the null, and the distribution under the alternative hypothesis. Remember, were talking about power, which means we're talking about correctly rejecting the null. If we are rejecting the null, we are therefore assuming our sample comes from a population different from the one specified under the null. Let's assume that we have a null hypothesis that is really true, but we reject it. In this case we have made a TYPE I error, right? Incorporating two distributions, we would draw this as follows. Notice the area shaded - it is the area beyond our criterion, or alpha. The area beyond alpha is the region (probability) for making TYPE I error (as shaded below):
Let's now assume that we have a null hypothesis that is really false, but we FTR. Now we have made a TYPE II error. In this case the alternative is true, so that is the distribution we're working with. Now, the area to the left of the original alpha under the alternative is our probability of error - TYPE II (light gray shading below). If we rejected the null as we should have, we would have made a correct decision. This would be defined as the probability to the right of the criterion under the alternative. More specifically, this probability is power (dark gray shading below).
|
||||||||||||
| To calculate power you
need 1) the population means under both the null and the
alternative; 2) the standard error; and 3) alpha. Example 1: Say you are interested in how many episodes of the Brady Bunch the population of PSY 715 students watch. In your null hypothesis you state that the mean number of episodes watched in the general population is 50, with a standard error of 5. However, the true mean for the population of 715 students is really 60 episodes.
Example 2: Say you are given the following information: the mean under the null is 50 episodes' the mean under the alternative is 30 episodes (maybe this is a population of chemistry graduate students), and the standard error is still 5. In this case alpha is .01. What is the power? What is TYPE II error?
|
We typically want as much
power as we want. Cohen (the Statistical God of Power)
generally recommends levels of at least .8 for power
(therefore TYPE II is .2). Here are some variables that
affect power.
|