Part V: Analysis of Variance (ANOVA)
Post Hoc Comparisons

 
 
Post hoc comparisons are generally performed only after obtaining a significant omnibus F. Then we look at all possible pairwise or all possible pairwise and otherwise comparisons. Here, we are focusing on the largest difference between levels of the IV, but you are still sifting through the data in hopes of finding something significant. Because of this, there is a very real problem that you will be capitalizing on chance findings, since although you implicitly test all comparisons, you only look at the largest differences (the ones most likely to be significant). You probably would not look at or care about comparisons that do not differ. The largest differences, whatever they may be, are the most likely to be significant due to chance.

Using post hoc techniques, one has not planned to make specific comparisons, so we make all comparisons. The net effect is that we require a LARGER difference between means to call them significant. There are two varieties of post hoc techniques: 1) pairwise; and 2) pairwise and otherwise. In pairwise techniques, we are only comparing two means at a time. All comparisons involve only two means or totals at a time. In the pairwise and otherwise techniques, minimally one comparison involves more than two conditions (i.e. group 1 + 2 vs. 3). In an experiment with "a" levels of an IV, there are a(a-1)/2 pairwise comparisons and 1+[(3a-1)/2]-2a pairwise and otherwise comparisons. The major difference between the two procedures is as follows:

  • All pairwise techniques are based on comparing obtained differences between two means versus critical differences required for significance (like the critical value used in the t-test). If the difference obtained is greater than or equal to the critical difference, it is significant. Otherwise there is no significant difference between the pairs of scores. The critical difference required to call a pair of means or totals significantly different is what differentiates all pairwise tests from each other.
  • In pairwise and otherwise techniques, get the Fobtained for each comparison and compare with the Fcritical adjusted.
 
 
Generally, the flow of the analysis is as follows:
  • Compute omnibus F.
  • If and only if the omnibus F is significant, then make up a table of differences between totals or means. For the table, the groups should be ranked from lowest to highest, based on their group means, with the lowest group being placed into position 1 and the group with the highest mean being placed into position 4.
  1 2 3 4
1 - 4 10 30
2   - 6 26
3     - 20
4       -

Each entry represents the absolute difference between each pair of groups using either totals or means depending on what you decide to enter in the table.

  • Calculate the critical difference, CD, which is the value above which an entry in the above table must be to call the groups significantly different. In the following calculations, n is the number in a single group.
CDLSD Picture (136x68, 1.4Kb) The same CD is used for each cell.
CDNewman-Keuls Picture (92x48, 1.2Kb) Different cells have different CDs. The r value is obtained by taking the difference in the number of steps between cells.
CDTukeyA Picture (92x56, 1.2Kb) The same CD is used for each cell. The q value is identical to the lowest q from the Newman-Keuls.
CDTukeyB Uses the average q from Tukey A and Newman-Keuls. Different cells have different CDs.
CDScheffe' Picture (290x58, 1.7Kb)Different cells have different CDs.
Least Stringent   Most Stringent
Most Powerful ----------> Least Powerful
Most Likely to Make a Type I Error   Least Likely to Make a Type I Error

Fisher LSD -> Newmann-Keuls -> Tukey B -> Tukey A -> Scheff

  • Compare the values in the table with the calculated CD. If the table values are greater than or equal to the CD, the 2 groups are significantly different. Otherwise they are not significantly different.
 
 
Given a SRD where a = 4 (# treatment groups), n = 6 (# Ss per group), MS error = .25, and alpha=.05. The mean values on the DV for the 4 groups are as follows: 15, 23, 17, 28 (respectively).

Assume that the null (all population means are equal) was rejected at the alpha=.05 level. Since we rejected the null, when conclude that a significant difference lies somewhere among these means; WE DO NOT, HOWEVER, KNOW WHICH MEANS ARE SIGNIFICANTLY DIFFERENT WITHOUT THE AID OF FURTHER TESTING. Therefore it is appropriate to do post hoc comparisons. Post hoc comparisons will allow us to find out which treatment levels are significantly different from each other.

Step 1: Rearrange the means from lowest to highest.

1 3 2 4
15 17 23 28

Step 2: Form a table which contains either differences between means or absolute difference between totals (= group mean * n).

Using Means Using Totals
  1 3 2 4   1 3 2 4
1 - 2 8 13   - 12 48 78
2   - 8 11     - 36 66
3     - 5       - 30

**You will compare each of the differences in the table with critical values to see if the differences are significant. There are four ways of obtaining critical values. Eaeh method is based on the same givens. These are: alpha, total degrees of freedom, n, MS error, and the number of 'steps' between the means under comparison.

From this above information, we can use tables in the book to find q, the Studentized Range Statistic. The critical values in our table (R) are of function of q. Specifically,

Picture (400x65, 2.5Kb)

Each of the four methods below compute q in a different way, thus yielding a different critical value R. The lower the R, the easier it is for our obtained data to be less probably than alpha, therefore making it easier to obtain a significant difference between pairs of means. Techniques which yield lower R values ar more 'liberal' than those which yield higher R values (they are also more powerful). It should be pointed out that a set of planned comparisons on the same data is more liberal (powerful) than any of the post-hoc tests. Formulas for computing critical values (R):

  • Neuman Keuls:

Picture (400x35, 1.8Kb)

  • Tukey A:

Picture (400x35, 1.9Kb)

  • Tukey B:

Picture (400x35, 1.8Kb)

  • Scheffe':

Picture (400x75, 2.4Kb)

Step 3: Compute the following:

Picture (350x35, 1.7Kb)

Therefore, R=q(12.25). We only need ot find values of q to substitute into this equation to get the critical values for R for each kind of test.

Step 4: Find values of q and R for each method.

  • Neuman-Keuls:

(1) Determine the number of steps. Rank order means from the lowest to the highest. For any comparison, the number of steps (R) = (the higher rank - the lower) + 1. For example, for B-C comparison, # steps=(3-2)+1=2. The table below gives the number of steps for each comparison. Note that r changes for each comparison when using the Neuman-Keuls method.

  A C B D
A - 2 3 4
C   - 2 3
B     - 2

(2) Compute df= na-a = 6(4) - 4=20

(3) Using the table in Keppel for the Studentized Range Statistic we find q for each value of r (# steps), df =20, and alpha=.05. To obtain R, multiply q by 12.25.

r(# of steps) 2 3 4
q 2.95 3.58 3.96
R 36.1375 48.855 48.51

(4) Compare these values of R to the differences between the comparison pairs for the respective groups (the first table of differences we did earlier).

  • Tukey A: This test uses a, the number of treatment conditions, as the number of steps for all comparisons. Note that this is also the largest number of steps that we used in Neuman-Keuls. Alpha and df are the same as above.
r 4
q 3.96
R 48.51
  • Tukey B: The average of R form N-K and Tukey A are taken as a 'compromise' value of the critical R. The number of steps is the same as for N-K.
r 2 3 4
R (N-K) 36.1375 43.855 48.51
R (Tukey A) 48.51 48.51 48.51
R (Tukey B) 42.3275 46.1825 48.51
  • Scheffe: In this test, q is defined as above, and with n=6 and a=4, we obtain a critical of 3.10 for alpha=.05. The entire expression for Scheffe q and for the calculation of the critical R is:

Picture (350x70, 2.3Kb)

As in Tukey A, this value is used for all comparisons.

Step 5: The next procedure is to make a set of tables of critical values for each test which can be compared to the matrix of observed data. These values are shown below.

Neuman-Keuls Tukey A
  A C B D   A C B D
A - 36.14 43.86 48.51   - 48.51 48.51 48.51
C   - 36.14 43.86     - 48.51 48.51
B     - 36.14       - 48.51
Tukey B Scheffe'
  A C B D   A C B D
A - 42.33 46.11 48.51   - 52.06 52.06 52.06
C   - 42.33 46.11     - 52.06 52.06
B     - 42.33       - 52.06

Step 6: If the obtained difference exceeds the critical value for a given comparison, then that comparison is significant. Significant differences are marked with an * on the table below. From looking at these tables can you order the tests from the most liberal (powerful) to most conservative?

  A C B D
A - 12 48*/** 78*/**/***
C   - 36 66*/**/***/****
B     - 30

*=sig for N-K **=sig for Tukey B ***=sig for Tukey A ****=sig for Scheffe

 
 
The Scheff and the Dunnett test are pairwise and otherwise comparison techniques. The Scheff is used when you wish to make implicitly ALL possible pairwise and otherwise comparisons, whereas the Dunnett is used when you wish to restrict the number of pairwise and otherwise comparisons to some of the total set. The Dunnett procedure is akin to doing planned comparisons with the benefits of post hoc procedures. These techniques are applicable if at least one comparison in a set of comparisons involves comparisons of three or more conditions.
  • Scheffe':
  1. Conduct the omnibus F. Proceed if and only if it is significant.
  2. Perform the Scheffe' procedure. Obtain the SS, MS, and F for your contrast using the planned contrast formulas.
  3. Calculate the critical F for Scheffe' using: FScheffe' = (a - 1) FCrit where FCrit is the critical F used from the omnibus F test.
  4. If the F ratio obtained in the contrast is greater than the critical value for Scheffe', the contrast is significant.
  5. Repeat for all pairwise and otherwise comparisons of interest.
  • Dunnett: Used when you want to restrict the number of comparisons you make to an a priori number. You pay an error rate penalty only for those pairwise and otherwise comparisons that you make. You are not paying a penalty for making all possible pairwise and otherwise comparisons. Thus, this test is less stringent than the Scheff.
  1. Compute omnibus F. Proceed if and only if it is significant.
  2. Obtain the SS, MS, and F for your contrast using the planned contrast formulas.
  3. Calculate FDunnett, whcih is found by taking a tabled value q (specifically tabled for the Dunnett procedure based on familywise alpha, the number of comparisons you wish to make, and dfS/A) and squaring it.
  4. If the obtained F for the contrast is larger than FDunnett, the contrast is significant.
  5. You may repeat this procedure the number of times you specified a priori.