Part VIII: Analysis of Covariance (ANCOVA)
Applications of ANCOVA

 
 
Participants are assigned to treatment and control groups in any ANOVA-type design. ANCOVA is then used as the statistical technique to eliminate irrelevant y variance. In this situation, y can be predicted by z via the usual regression prediction formula.

However, ANCOVA uses the pooled correlation between y and z for all experimental groups in the design. This pooled correlation represents the "average" regression slope within each group, rather than the attempt to predict all the scores, regradless of experimental condition.

Thus, the Y' scores yielded by the regression are the proportion of y that is predictable by z. By the same token, the set of residual scores (Y - Y') are the portions of y that can not be predicted by z. ANCOVA, then, is simply an ANOVA which is performed on residual scores.

Because ANCOVA estimates an additional population parameter (the pooled population correlation), another degree of freedom is lost from the error term. These terms can be compared for the simple randomized ANOVA and the simple randomized ANCOVA:

Test df Error Treatment df Total df
SRD ANOVA N - k k - 1 N - 1
SRD ANCOVA N - k - 1 k - 1 N - 1
 
 
A very controversial use of ANCOVA is to correct for initial group differences (prior to assigned to x) that exists on y among several intact, state variable groups. For example, we may be only able to use two intact, separate classrooms to evaluate the relative efficacy of two teaching methods. Any pre-existing classroom differences (e.g. reading comprehension) will be confounded with our x variable. In this use of ANCOVA, we use some covariate (which predicts y to some extent) to make the treatment groups more equivalent to each other by removing from y that portion predictable by z. In short, we attempt to eliminate initial group differences on y which are confounded with x so that if a treatment effect does occur, we can be more confident that the treatment effect was not simply the result of pre-existing differences.

But how can the groups be made more equivalent on y? Earlier, the pooled within groups regression line of y on z was used to adjust the y scores of all participants; in this application, the pooled within-groups line is used to reduce initial differences. As can be seen in the first diagram below, groups 1 and 2 differ in terms of their means. Next, the pooled within-groups line is added to equate these groups (the second diagram). Comparisons between the groups are then made in relation to the pooled within-groups line (i.e. their distance from the regression line rather than their true means on y), so that any differences between the groups on scores should be due solely to the treatment effect and not initial differences.

This procedure does, however, have two main problems:

  • Equivalency: This adjustment still does not mean that the two groups are equivalent. Because we are dealing with intact groups, they may still differ on any number of characteristics which may cause unequal group means on y that are confounded with any treatment effects.
  • Regression Slope: ANCOVA, remember, operates on the pooled within-group regression line of y on z. This line must be compared to the between-group regression line (which is obtained by drawing a line through the center of all group distributions). When this is done, two possibilities emerge:

(1) The pooled within-group and the between-group regression lines have the same sign for slope. In this situation, ANCOVA corrections will tend to make the groups more equivalent on y

(2) the slope of the pooled within-group line is inconsistent with that of the between group line in which case the initial group differences will be increased due to more confounding with x.

 
 
Remember that ANCOVA is simply a statistical method of accounting for a third variable; in experimental studies, the randomized blocks design is used to accomplish the same thing. Thus a dilemma occurs: should the third variable be included as a covariate despite the fact that power will be lost because of decreasing degrees of freedom or should the third variable be used as a blocking variable instead?

If the correlation between the dependent variable and the covariate is greater than .6 then it is best to use ANCOVA, as it best alleviates the effects of a strong covariate despite the loss of a degree of freedom. If the correlation is less than .4, it is best to use the Randomized Block Design because the maintenance of the degree of freedom will give you more power. If the correlation is between .4 and .6 , a decision must be made based on the benefits of each technique.

Lastly, both RBD and ANCOVA have specific benefits. If the covariate is known about ahead of time, RBD may be best. Also, ANCOVA should be used for interval level covariates. If you are interested in the interaction, trends, or both, use RBD. Finally, ANCOVA can deal with multiple covariates.