Part
IV: Correlation and Regression |
| | Calculating the Regression Line | Interpretion of the Correlation | |
| Suppose we are trying to
predict some continuous Y variable from X and we obtain
the scatterplot to the right. We wish to construct the
"best fitting" prediction line which captures
the linear relationship between X and Y and allows us to
predict Y from a given X score. We wish to form this
prediction line such that we make as few prediction
errors as possible. An error in prediction is defined as
the difference between each subject's actual score and
the predicted Y score obtained from X via the prediction
line.
The regression line which minimizes errors of prediction for the whole sample will have the following formula, including a breakdown for the slope (b) and the intercept (a):
Note that the slope of the regression line is highly related to the correlation coefficient. Thus, we can obtain the prediction line from each X value via:
Furthermore, if X and Y are expressed as standard scores (i.e., we have converted raw scores for X and Y to z scores), then using a little algebraic massaging:
|
| The correlation
coefficient is an index of the magnitude or strength of
the linear relationship between X and Y. The coefficient
ranges from -1.00 to +1.00, where a correlation equal to
+1.00 means a perfect positive linear relationship
between X and Y, while a correlation equal to -1.00 means
a perfect negative relationship. Note: When the absolute
value of the correlation is 1.00, all y values fall on
the regression line. If we square rxy, we get the proportion of Y variance that is explained or accounted for by X. Another way to express this is:
|