|
|
||||||
|---|---|---|---|---|---|---|
Note that in scatterplots, the X and Y axes are equal in length and thus this type of graph does not obey the 3/4 high rule.
II. Range of a Correlation Coefficient
Is best illustrated with examples:
More realistic example
More realistic example
More realistic example
| r = | ± | (0 « 1) |
|---|---|---|
| Sign | Magnitude | |
| Gives direction |
Gives strength |
| X | ZX | ZX2 | Y | ZY | ZXZY | |
|---|---|---|---|---|---|---|
| 3 | -1.42 | 2.02 | 1 | -1.42 | 2.02 | |
| 5 | -.71 | .50 | 2 | -.71 | .50 | |
| 7 | 0 | 0 | 3 | 0 | 0 | |
| 9 | .71 | .50 | 4 | .71 | .50 | |
| 11 | 1.42 | 2.02 | 5 | 1.42 | 2.02 | |
| s=2.82 |
åZX2=5=N |
s=1.41 |
åZXZY=åZX2 |
|||
| m=7 | m=3 | |||||
| N=5 | ||||||
If the relative position of the scores on the two variables is the same (as in the present case), then the z scores of each of the variables will be the same and å(ZXZY) would be equal to åZX2. As we saw above, åZX2 is equal to N and thus r would equal N/N or 1.
Now for the perfect negative relationship between X & W.

| X | ZX | ZX2 | W | ZW | ZXZW | |
|---|---|---|---|---|---|---|
| 3 | -1.42 | 2.02 | 5 | 1.42 | -2.02 | |
| 5 | -.71 | .50 | 4 | .71 | -.5 | |
| 7 | 0 | 0 | 3 | 0 | 0 | |
| 9 | .71 | .50 | 2 | -.71 | -.5 | |
| 11 | 1.42 | 2.02 | 1 | -1.42 | -2.02 | |
| s=2.82 | åZX2=5=N |
åZXZW=-5 |
||||
| m=7 | ||||||
| N=5 | ||||||
The scores again have the same relative position, but this time the relationship is indirect. In this case, å(ZXZW) would be equal to -N and r would be equal to -N/N or -1.
Example: Scores on 20 point math and science quizzes. [Minitab]
| Person | Math (X) | Science (Y) |
|---|---|---|
| A | 11 | 11 |
| B | 13 | 10 |
| C | 18 | 17 |
| D | 12 | 13 |
| E | 16 | 14 |
| N=5 |
First step would be to create a scatterplot:

Since the scatterplot looks promising (suggests a strong positive relationship), create the necessary grid for the computations.
| Person | Math (X) | Science (Y) | XY | X2 | Y2 |
|---|---|---|---|---|---|
| A | 11 | 11 | 121 | 121 | 121 |
| B | 13 | 10 | 130 | 169 | 100 |
| C | 18 | 17 | 306 | 324 | 289 |
| D | 12 | 13 | 156 | 144 | 169 |
| E | 16 | 14 | 224 | 256 | 196 |
| N=5 | åX=70 | åY=65 | åXY=937 | åX2=1014 | åY2=875 |
![]()
![]()
As was suggested by the scatterplot, there is indeed a strong positive correlation between the math and science scores.
A variant of Pearson’s r which is used with rank data is called Spearman’s Rho (rs). This correlation coefficient is appropriate when either of the following two conditions are met:
In either case, both scales must be converted to ranks. And if we computed Pearson's r on the ranked data, it would give Spearman's Rho. However, for computations by hand, there is a simpler formula:
Where D= Rank of X – Rank of Y (i.e., a Difference score).
| Person | Beauty | Sociability |
|---|---|---|
| A | 3 | 3 |
| B | 1=most | 2 |
| C | 2 | 1=most |
| D | 5 | 4 |
| E | 4 | 5 |
| N=5 | ||
First step would be to create a scatterplot.

Since the scatterplot looks promising (suggests a strong positive relationship), create the necessary grid for the computations.
| Person | Beauty | Sociability | D | D2 |
|---|---|---|---|---|
| A | 3 | 3 | 0 | 0 |
| B | 1=most | 2 | -1 | 1 |
| C | 2 | 1=most | 1 | 1 |
| D | 5 | 4 | 1 | 1 |
| E | 4 | 5 | -1 | 1 |
| N=5 | åD=0 | åD2=4 | ||
Then perform the computations:
| Person | Beauty | Beauty (reranked) |
Science | Science (ranked) |
|---|---|---|---|---|
| A | 3 | 3 | 11 | 2 |
| B | 1=most | 5=most | 10 | 1 |
| C | 2 | 4 | 17 | 5=most |
| D | 5 | 1 | 13 | 3 |
| E | 4 | 2 | 14 | 4 |
| N=5 | ||||
Then we would create a scatterplot of the ranked scores.

The data do not look very promising, but let's prepare the grid for the computations anyway.
| Person | Beauty (reranked) |
Science (ranked) |
D | D2 |
|---|---|---|---|---|
| A | 3 | 2 | 1 | 1 |
| B | 5=most | 1 | 4 | 16 |
| C | 4 | 5=most | -1 | 1 |
| D | 1 | 3 | -2 | 4 |
| E | 2 | 4 | -2 | 4 |
| N=5 | åD=0 | åD2=26 | ||
Then perform the computations:
So as the scatter plot indicated, there wasn't much of a correlation.
Note tied ranks would get the average of the tie, for example:
Person X Y Y (rank) A 3 11 4.5 B 1 11 4.5 C 2 17 1 D 5 13 3 E 4 14 2 N=5
V. Important Issues With Correlation
These are the reasons why it is important to create a scatterplot.
Example of a curvilinear (or nonmonotonic) relationship:
In general, curvilinearity in a relationship will result in an r that underestimates the true relationship.
Example of a Restricted Range - Foot size and age in 6 year olds:
Example of a Truncated Range - ACT scores and GPA in college students:
Possible causal relationships between X and Y if they are correlated include:
Possibility Symbols Explanation a.X ® Y X causes Y b.X ¬ Y Y causes X c.X ¬ A ® Y A causes both X & Y d.B ® C ® X
B ® YEtc. Main point is that correlation doesn’t tell us much about causality. It should be noted that inferring causality from a correlation is an error that is all too common.
3. Some Specific Uses of Correlation
- Determining Reliabilities
Compare two raters (interobserver) or the same raters (intraobserver) observations of behavior to see if they agree.- Determining Validities
If ACT scores are highly correlated with GPA's then we can say that ACT scores are a valid predictor of GPA's.- For Prediction
A set of procedures similar to correlation called regression is used for predicting one variable from one or more other variables.