|
|
||||||
|---|---|---|---|---|---|---|
Central Tendency & Variability
I. Measures of Central Tendency (or Averages)
Here, we are interested in the typical, most representative score. There are three measures of central tendency that you should be familiar with. Note that when reporting these values, one additional decimal of accuracy is given compared to what is available in the raw data (even if the additional decimal is a zero, e.g., 43.0).
Computation - Example:
| X |
|---|
| 2 |
| 3 |
| 5 |
| 10 |
| åX = 20 |
| N=4 |
Since means are typically reported with one more digit of accuracy that is present in the data, I reported the mean as 5.0 rather than just 5.
When working with grouped frequency distributions, we can use an approximation:
![]()
For example:
Interval Midpoint f Mid*f 95-99 97 1 97 90-94 92 3 276 85-89 87 5 435 80-84 82 6 492 75-79 77 4 308 70-74 72 3 216 65-69 67 1 67 60-64 62 2 124 åf=25=N åMid*f=2015
When computed on the raw data, we get:
Thus the formula for computing the mean with grouped data gives us a good approximation of the actual mean. In fact, when we report the mean with one decimal more accuracy than what is in the data, the two techniques give the same result.
Properties
| Xs | |
|---|---|
| 1, 2, 3 | 2 |
| 1, 2, 30 | 11 |
| 1, 2, 300 | 101 |
| X | ![]() |
|
|---|---|---|
| 2 | 5 | -3 |
| 3 | 5 | -2 |
| 5 | 5 | 0 |
| 10 | 5 | 5 |
åc = 0 |
||
| X | c | c2 | X-4 | (X-4)2 |
|---|---|---|---|---|
| 2 | -3 | 9 | -2 | 4 |
| 3 | -2 | 4 | -1 | 1 |
| 5 | 0 | 0 | 1 | 1 |
| 10 | 5 | 25 | 6 | 36 |
|
åc2= 38 |
å(X-4)2 = 42 | |||
So, 38 is less than 42. This relationship would hold with any "other value."
Computation - There are several situations possible:

Fortunately, there is a formula to take care of the more complicated situations, including computing the median for grouped frequency distributions.

|
|
= Lower exact limit of the interval containing Md. |
|
|
= number of scores below L. |
| = number of scores within the interval containing Md. | |
| = the width of the interval (for ungrouped data i=1). | |
| = the Number of scores. |
Using our last example:
| = 4.5 | |
| = 1 | |
| = 3 | |
| = 1 | |
| = 6 |
Properties
| Xs | Md | |
|---|---|---|
| 1, 2, 3 | 2 | 2 |
| 1, 2, 30 | 11 | 2 |
| 1, 2, 300 | 101 | 2 |
| 1, 2, 3000 | 1001 | 2 |
Note that the presence and direction of skew in the distribution can be determined from the mean and median. The key to understanding this is to be aware that the mean is sensitive to all scores, while the median is not. There are three rules:
Variability refers to the extent to which the scores in a distribution differ from each other. An equivalent definition (that is easier to work with mathematically) says that variability refers to the extent to which the scores in a distribution differ from their mean. If a distribution is lacking in variability, we may say that it is homogenous (note the opposite would be heterogenous).
We will discuss four measures of variability for now: the range, mean or average deviation, variance and standard deviation.

Distribution A has a larger range (and more variability) than Distribution B.
Because only the two extreme scores are used in computing the range, however, it is a crude measure. For example:

The problem with the MD is that due to the use of the absolute value, it is a terminal procedure. In other words, it cannot be used in further calculations (which is something that we would like to be able to do).
Since the stardard deviation can be very small, it is usually reported with 2-3 more decimals of accuracy that what is available in the original data.
Properties of the Variance & Standard Deviation:
Estimation is the goal of inferential statistics. We use sample values to estimate population values. The symbols are as follows:
| Measure | Sample | Population |
|---|---|---|
| Mean | ||
| Variance | s2 | s2 |
| Standard Deviation | s | s |
It is important that the sample values (estimators) be unbiased. An unbiased estimator of a parameter is one whose average over all possible random samples of a given size equals the value of the parameter.
While
is an unbiased estimator of m, s2 is not an unbiased estimator of s2.
In order to make it an unbiased estimator, we use N-1 in the denominator of the formula rather than just N. Thus:
Note that this is a defining formula and, as we will see below, is not the best choice when actually doing the calculations.
Let's reconsider an example from above of two distributions (A & B):
Consider a possibility for the scores that go with these distributions:
Distribution A B Data 150 150 145 110 100 100 100 100 55 90 50 50 å600 600 N6 6 100 100 Range150-50+1=101 150-50+1=101
Notice that the central tendency and range of the two distributions are the same. That is, the mean, median, and mode all equal 100 for both distributions and the range is 101 for both distributions. However, while Distributions A and B have the same measures of central tendency and the same range, they differ in their variability. Distribution A has more of it. Let us prove this by computing the standard deviation in each case. First, for Distribution A:
A c c2 150 100 50 2500 145 100 45 2025 100 100 0 0 100 100 0 0 55 100 -45 2025 50 100 -50 2500 å 600 0 9050 N 6
Plugging the appropriate values into the defining formula gives:
Measure A
Note that calculating the variance and standard deviation in this manner requires computing the mean and subtracting it from each score. Since this is not very efficient and can be less accurate as a result of rounding error, a computational formula is typically used. It is given as follows:
Redoing the computations for Distribution A in this manner gives:
| A | X2 | |
|---|---|---|
| 150 | 22500 | |
| 145 | 21025 | |
| 100 | 10000 | |
| 100 | 10000 | |
| 55 | 3025 | |
| 50 | 2500 | |
| å | 600 | 69050 |
| N | 6 |
Then, plugging in the appropriate values into the computational formula gives:
Note that the defining and computational formulas give the same result, but the computational formula is easier to work with (and potentially more accurate due to less rounding error).
Doing the same calculations for Distribution B yields:
| B | X2 | |
|---|---|---|
| 150 | 22500 | |
| 110 | 12100 | |
| 100 | 10000 | |
| 100 | 10000 | |
| 90 | 8100 | |
| 50 | 2500 | |
| å | 600 | 65200 |
| N | 6 |
Then, plugging in the appropriate values into the computational formula gives:
Thus, Distribution A clearly has more variability than Distribution B.