DATA ANALYSIS: CORRELATION
A. BACKGROUND
1. CORRELATION
Correlation means that two variables (sets of data) have some
type of
association with each other, such that as one variable
increases,
the other also increases (a positive
correlation), or
decreases (a
negative correlation).
2. CAUSE AND EFFECT
It is tempting to assume that when two variables are positively
correlated
that one causes the other (i.e., the variables have a
"cause and effect" relationship,
but this is not always the case.
The purpose of today's lecture is learn how to establish cause
and effect
relationships from correlations and why this can be
a difficult
task.
B. TYPES OF CORRELATIONS
In
geology there are several different types of correlations that can
be used to help establish cause and effect
relationships:
1. SPATIAL PATTERNS
Example: the
distribution of landslides
and topography (U.S.)
Spatial correlations could be coincidental, so there needs to be
a reasonable
causal mechanism to explain why the relationship
establishes
cause and effect.
2.
TEMPORAL PATTERNS
Example: trends in mean sea level though time
Note that time is not the cause of temporal trends,
but temporal
trends
suggest that the variable which changes through time is
not behaving
randomly (i.e., there is a reason for the trend).
3. PHYSICAL PROPERTIES
Example:
sediment grain size and permeability
Geologists look for correlations between the physical properties
of earth
materials when they already suspect that there a reason
for such a
relationship to exist. Thus the correlation becomes a
test of the
hypothesized cause and effect relationship.
4. PHYSICAL PROCESSES
Example: rainfall and surface water runoff
Geologists look for correlations between earth processes when
they suspect
that there a reason for such a relationship to exist.
Thus the
correlation becomes a test of the hypothesized cause
and effect
relationship.
C. DIFFICULTIES
1. COINCIDENTAL CORRELATION
Just because two variables are correlated does not
necessarily
mean that
one causes
the other (e.g., the behavior of the stock
market and its relationship
to the winner of the NFL Super Bowl).
There always needs to be a reasonable causal mechanism
that
explains why
the correlation reflects cause and effect.
2. APPARENT TRENDS
Trends that are not statistically significant, either because they
are too
weak or because there are too few data, should not be
used as
evidence for cause and effect (e.g., trying to establish
mean sea level trends with only one year of data).
3. CORRELATED EFFECTS
Two variables might be correlated because both are effects of
the same
cause (e.g., the worldwide distributions of volcanoes
and earthquakes
are highly correlated, because both occur at
subduction zone plate boundaries).
4. THRESHOLDS
Cause and effect relationships may not become apparent until
after a
certain threshold has been reached (e.g., the
effect of
slope
angle on landslide velocity).
Relationships might be obscured by the fact that there is more
than one
factor controlling the "effect" (e.g., the occurrence of
landslides is controlled by slope angle and cohesion).
6. NON-LINEAR CORRELATION
Correlations that are not linear require more data to be defined
accurately (e.g.,
river discharge and flood levels).
D. ESTABLISHING CAUSE AND EFFECT
In
summary, to establish that a cause and effect relationship exists
between correlated variables, one should
ask the following:
1. ARE THERE ENOUGH DATA?
Correlations based on two or three points and correlations that
are not
statistically
significant do not establish cause and effect.
2. IS THERE A CAUSAL MECHANISM?
A correlation is more suggestive of cause and effect if there is a
causal mechanism which can explain the relationship.
3. HAS THE MODEL BEEN TESTED?
A correlation is more suggestive of cause and effect if it can be
used
to predict a future event or condition.
However, one must always be cautious about extrapolation!
EXAMPLE: The Rocky Mountain Arsenal
When
the arsenal began injecting
liquid wastes into the subsurface
in 1962, there was a sudden increase in the number of
earthquakes
near Denver. Geologists suspected
that high pressures associated
with waste
injection might be causing these earthquakes, and they
were able to
test this hypothesis to establish a cause and effect.