Part III: One & Two Sample Parametric Tests
Decision Errors and Power

 
 
When we perform a significance test, we generally compare the probability of getting our observed value to some pre set criterion probability. This criterion is alpha, and is typically set at .05. More often than not, people typically do not even think about probability in these situations. Rather, people typically look at their observed value of a statistic and compare that to a critical value (where in most cases one rejects the null if the observed value is greater than the critical). This is fine, and certainly by no means incorrect. But thinking of your decision of whether to reject or fail to reject the null in terms of probability makes it easier to understand the nature of power and decision errors.

Much as we would like to think we make correct decisions when we either reject or FTR the null, there is no way of knowing for sure. However, it would only make sense that occasionally we will be wrong. Thus, we can only focus on the probability of making such errors occurring - we will never know for sure if we have made an error or not. There are two forms of errors we can estimate in terms of their probabilities, both of which are readily calculable.

 
 
Type I error (Alpha) is the probability of rejecting the null hypothesis when the null is true; it is the probability of rejecting the null when you should really have failed to reject (FTR); it's saying there is a significant effect when there truly is none.

For example, say you have a null hypothesis that the mean height for Wayne State men is 68 inches (Ho: u = 68). You go out, get a sample of 50 WSU men, measure each of them, and find that their mean height is 76 inches. You calculate your statistic (say z = 2.98) and compare it to a critical value (z = 1.96). You'd reject the null here (b/c observed is bigger than the critical value), and state that this sample is significantly taller than 68", or that this sample comes from a different population from which the null specifies.

But what if this sample happens to be basketball, volleyball, and football players? These men are from the WSU population, right? It just happened that we got a highly atypical sample in terms of height. We rejected the null, but we really should not have. This is TYPE I error.

How often will this TYPE I error accur? If alpha is set at .05, we will make TYPE I error 5 times out of a hundred (out of 100 null hypotheses that we reject, 5 will be incorrectly rejected. Thus we have control over TYPE I in that we can set alpha to what ever we want. If you really did not want to make a TYPE I error, you could set alpha really low (.0000001). A critical value associatod with this alpha would be huge, making it virtually impossible to find an observed score greater than this value. If you cannot reject the null, you cannot commit TYPE I ERROR. However, this conservatism can lead to another type of error.

 
 
Type II error (Beta) is the probability of failing to reject the null when the null is not true; it is the probability of failing to reject the null when you really should; ils the probability of saying there is no significant effect when there really is one.

For example, say you have a null hypothesis that the mean IQ for the general population is 100 ( u = 100). Say you sample 50 people who happen to be from MENSA, and their mean IQ is 120. You calculate your statistic (z = 1.88), and compare it to a critical value (z = 1.96). Here you would fail to reject the null, and conclude that the MENSA population is not significantly different from the general population. But in actuality, the mean IQ in the population of MENSA members is 140. We just happened to get a sample of "lower" ability MENSA members. We thus made an error - we FTR when we should have rejected.

Thus, low alpha levels (making it impossible to reject the null) could lead to more instances where one erroneously FTR the null. How often does TYPE II error occur? We'll get to that in a minute. But for now, remember this: The only way one can make a TYPE II error is to fail to reject; if you reject the null, you cannot make TYPE lI error.

 
 
The diagram below represents four outcomes of the decisions we make, in terms of whether or not the null is true, and whether we reject the null or not.

  Truth of Null
Decision True Not True
Reject Null TYPE I POWER
FTR Null CORRECT TYPE II

As you see, FTR the null when the null is true is a correct decision. However, we're usually interested in trying to find true differences, and therefore look to reject null hypotheses. Rejecting the null when it is really not true is a correct decision as well. More specifically, the probability a test has to do this is referred to power. Power may be defined as the probability of correctly rejecting the null hypothesis. In other words, it is the probability of rejecting the null hypothesis given that the null is incorrect. Some people also refer to power as precision or sensitivity.

Power is actually a function of TYPE II error, or, Beta. More specifically, Power is equal to 1 - Beta. How these values are calculated is shown in examples below.

When you calculate power, you need to identify two distributions - the distribution under the null, and the distribution under the alternative hypothesis. Remember, were talking about power, which means we're talking about correctly rejecting the null. If we are rejecting the null, we are therefore assuming our sample comes from a population different from the one specified under the null.

Let's assume that we have a null hypothesis that is really true, but we reject it. In this case we have made a TYPE I error, right? Incorporating two distributions, we would draw this as follows. Notice the area shaded - it is the area beyond our criterion, or alpha. The area beyond alpha is the region (probability) for making TYPE I error (as shaded below):

Picture (380x125, 2.9Kb)

Let's now assume that we have a null hypothesis that is really false, but we FTR. Now we have made a TYPE II error. In this case the alternative is true, so that is the distribution we're working with. Now, the area to the left of the original alpha under the alternative is our probability of error - TYPE II (light gray shading below). If we rejected the null as we should have, we would have made a correct decision. This would be defined as the probability to the right of the criterion under the alternative. More specifically, this probability is power (dark gray shading below).

Picture (380x125, 7.7Kb)

 
 
To calculate power you need 1) the population means under both the null and the alternative; 2) the standard error; and 3) alpha.

Example 1: Say you are interested in how many episodes of the Brady Bunch the population of PSY 715 students watch. In your null hypothesis you state that the mean number of episodes watched in the general population is 50, with a standard error of 5. However, the true mean for the population of 715 students is really 60 episodes.

Step 1: Draw the diagram. By looking at the null and alternative mean values, you would draw the alternative to the right of the null. We are interested in where the distriputions overlap, so we will be working in the right tail of the null.

Picture (250x125, 2.6Kb)

Step 2: Identify where the cntical value is for alpha under the null.

Step 3: Find the raw score value at alpha. Since we have a one-tailed test and are using alpha of .05 we know that z = 1.65 (using your z table and looking for the z value for which .05 of the area is in the tail). We also know the mean under the null is 50, and the standard error is 5. Thus, we simply plug the numbers into the z formula and solve for X:

1.65 = (X - 50)/5

X = 58.25

Step 4: Find the z score for X under the alternative hypothesis. X is 58.25, and has a z score of 1.65 under the null. But we need to find the z score for this value under the alternative. Again using the basic z formula, we have the following:

z = (58.25 - 60)/5

z = -.35

Step 5: Find the probability for the z score under the alternative. The probability to the left of z = -.35 is .36, and to the right is .64.

Picture (250x125, 5.2Kb)

Step 6: Solve for Power or Beta. It is very important to remember what you are looking for - it is easy to get confused which probabilities are associated with Power or Beta. The best way to prevent mistakes is to draw the picture. If I were asked to solve for power, would look at the area to the right of z; if solving for Beta, I would look to the left. Do you see why this is so? All we are doing is putting numbers on the diagram of the null and alternative hypotheses where the alternative is true; we are putting number on the general diagram which delineates Power and Beta. Thus, here, Power = .64, and Beta= .36 (remember, Power = 1 - Beta).

Picture (250x125, 5.4Kb)

Example 2: Say you are given the following information: the mean under the null is 50 episodes' the mean under the alternative is 30 episodes (maybe this is a population of chemistry graduate students), and the standard error is still 5. In this case alpha is .01. What is the power? What is TYPE II error?

Step 1: Notice that this time, the alternative is to the left of the null. Thus, we'll be working in the left tail.

Picture (250x125, 2.7Kb)

Step 2: The critical value is now -1.96, since we are using a one-tailed test at alpha = .01. Why is it negative now? Because we know we're working to the left of the mean (where raw score values are less than the mean), thus z's will be negative.

Step 3: Find raw score for alpha under the Ho: -1.96 = (X - 50)/5; X = 40.2.

Step 4: Find z score for X, under the alternative hypothesis: z = (40.2 - 30)/5; z = 2.04

Step 5: Find the probabilities for z = 2.04: they are .98 (to the left) and .02 (to the right)

Picture (250x125, 5.3Kb)

Step 6: Here, Power is .98 and Beta is .02. Power is always in the direction of the most extreme tail of the alternative, whether it is negative or positive.

Picture (216x125, 5.5Kb)

 
 
We typically want as much power as we want. Cohen (the Statistical God of Power) generally recommends levels of at least .8 for power (therefore TYPE II is .2). Here are some variables that affect power.
  • Alpha: Raising alpha (going from .01 to .05) raises power. A larger alpha has a smaller critical value, thereby making it easier to reject the null, thereby giving you more power.
  • Directional Hypotheses: Having a directional as opposed to non directional increases power. A one-tailed test will have a lower absolute critical value, thereby making it easier to reject the null, thereby giving you more power.
  • Big Sample Size: Having a larger sample size will increase power. A large sample size gives a smaller standard error, which makes the observed statistic larger, thereby making it easier to reject the null, thereby giving you more power.
  • Little Measurement Error: Having little error will increase power. Less error will yield in more accurate measurement which will decrease the overlap between the null and alternative distribution, thereby making it easier to reject the null, thereby giving you more power.
  • Large Effect Sizes: The further the distance between the null and the alternative, the greater the power. The further the distance, the less overIap, thereby making it easier to reject the null, thereby giving you more power.