|
THE IMPORTANCE OF DATA DISTRIBUTIONS
A. INTRODUCTION
When data are collected to formulate or to test cause and effect
hypotheses, there is typically some
variability to the values that
are measured. The range in values
and how frequently specific
values occur is important to a proper
interpretation of data.
B. FREQUENCY DISTRIBUTIONS
There
is almost an infinite variety of data frequency distributions,
but we shall consider only three in
our efforts to understand how
frequency distributions affect the
interpretation of data.
1. NORMAL DISTRIBUTIONS
The normal distribution (sometimes called the "bell curve") is
perhaps
the best-known. Data are symmetrically distributed to
either
side of a central value, so the entire population can be
represented equally well by
the mean,
median, or the mode.
2.
SKEWED DISTRIBUTIONS
Skewed distributions are similar to the normal distribution, but
the
data are not distributed symmetrically. In these cases, the
entire
population is better represented by either the median or
the
mode than by the mean.
3. UNIFORM DISTRIBUTION
A
uniform distribution means that there is no central value but
that
every value has the same likelihood of occurring. In such
cases,
the mean, median, and mode are meaningless.
C. PROBABILITY DISTRIBUTIONS
Probability distributions show the likelihood that a given value will
occur as an increasing number of data
points are collected. The
best way to illustrate this is to
contrast the probability distributions
for two of the frequency
distributions described above:
1. UNIFORM DATA
When data have a uniform frequency distribution, it means that
they
are a result of random
variations; and randomness means
that no
simple cause and effect relationship exists between the
observed values and some other variable.
Example: probability of
rolling a six with six-sided die.
2. NORMAL DATA
When
data have a normal frequency distribution, it means that
they are not the result of random variations, and certain values
are
more likely than others. This suggests that there is a some
reason
(or cause) for the observed values (effects).
Example: probability of
eating a meal during the day.
Note that these two probability distributions have
different shapes.
D. EXAMPLE: RECURRENCE
INTERVALS
1.
DEFINITION
A
recurrence
interval is the average time period, usually in
years, between "events" of the same magnitude (such as a
Magnitude 7 earthquake or 100-year flood). But remember
that average (or mean) values are meaningful only for data
distributions that have a
central tendency.
2. RANDOM EVENTS
In this context, random
events are those that occur without
any pattern to their sequence through time. Therefore, the
probability (or likelihood) of a truly random event occurring
does not in
any way depend on what occurred previously.
a. One-Year
Probability
The probability that a random event will occur
during
any
given year depends on the recurrence interval of
that event (in years):
Risk or Probability = 1 / (Recurrence Interval)
b. Cumulative Probability
The probability that a
random event will occur during
any given time period depends on the time period of
interest and the recurrence interval of that event:
Risk or Probability = 1 - [1 - (1 / R.I.)]n
3. NON-RANDOM EVENTS
The probability of non-random events occurring during a
given year or time period does depend on what happened
previously (example: normal data
distribution).
|