|
|
||||||
|---|---|---|---|---|---|---|
Hypothesis Testing:
Continuous Variables (1 Sample)
I. Sampling Distribution of the Mean
As might be expected, inference with continuous variables is more complicated than with dichotomous variables. Fortunately however, the general principles are the same. Again, we will use a sampling distribution to index the probability that the observed outcome is due to chance.
To get a better feel for this notion, let’s consider an empirical example
(or one that could be actually performed). Let us choose 10 samples of size 4
from a population of size 20.
| Population Distribution |
10 observed Sample Distributions |
Empirical Sampling Distribution of the Mean |
|---|---|---|
| 6, 2, 9, 5, 0, 1, 3, 2, 1, 1, 5, 2, 7, 7, 7, 8, 1, 1, 3, 7 |
1, 5, 9, 0 | 3.75 (& sX=4.11) |
| 0, 3, 1, 5 | 2.25 | |
| 5, 8, 3, 0 | 4.00 | |
| 1, 5, 0, 7 | 3.25 | |
| 7, 6, 1, 3 | 4.25 | |
| 3, 2, 1, 7 | 3.25 | |
| 2, 0, 3, 5 | 2.50 | |
| 1, 2, 1, 1 | 1.25 | |
| 2, 7, 1, 7 | 4.25 | |
| 9, 7, 6, 2 | 6.00 | |
![]() |
Ns=4 | ![]() ![]() |
Notes:
It turns out that
(note that
and
are
synonyms).
The standard deviation of the distribution of sample means is called the standard error of the mean (or more simply, the standard error). It measures variability in the distribution of sample means or, in other words, sampling error (the amount of error we can expect due to using a sample mean to estimate a population mean). One would expect the size of the standard error to be related to the sample size, and it is.
When population values are known:

When population values are estimated from sample values:

Computational formula for the standard error estimated from sample values:
Example (using the population distribution from the empirical sampling distribution above):
If we didn’t know the population values, we could use the SX from the first sample.
As you can see,
only
estimates
and it
does so poorly in this case (because of the small sample size).
In Symbols In Words HO m=7 Physostigmine has no effect on memory. HA m¹7 Physostigmine has an effect on memory. These hypotheses should be specific to what is being tested.
Decision Rules
We will use the standard normal curve (Z scores) to obtain the probabilities.
Our alpha level is .05 with a two-tailed test. When we look in a Z table, we see
that the critical value of Z is 1.96 (Zcrit).
Thus, the shaded area is the critical region. If our observed z value falls into this area, we will reject the null hypothesis. More formally:
If Zobs £ -1.96 or Zobs ³ 1.96, then reject HO.
If Zobs > -1.96 and Zobs < 1.96, then do not reject HO.
|
9
|
8
|
8
|
9
|
9
|
|
8
|
10
|
8
|
10
|
7
|
|
7
|
7
|
8
|
8
|
10
|
|
9
|
8
|
8
|
7
|
9
|
Remember:
More generally:
Thus:
And:
III. Errors & the Power of a Test
As can be seen, hypothesis testing is just educated guessing. Moreover, guesses (educated or not) are sometimes wrong. Consider the possible decisions we can make:
| Possibilities: | Actual Situation | ||
|---|---|---|---|
| HO is True | HO is False | ||
| Decision | Reject HO | Type I Error | Correct Decision II |
| Do not reject HO | Correct Decision I | Type II Error | |
Let us now consider each decision in more detail.
A Type I Error is the false rejection of a true null. It has a probability of alpha (a). In other words, this error occurs as a result of the fact that we have to somehow separate probable from improbable.
Correct Decision I occurs when we fail to reject a true null. It has a probability of 1-a. From a scientist's perspective this is a "boring" result.
In the Case I example, both the mean (m) and standard deviation (s) of the population were given. However, these parameters are rarely known. In this section, we will consider how the test is performed when s is unknown.
As we noted earlier,
can be used to estimate
. One complication of doing this is that the shape of the theoretical distribution of sample means will depend on the sample size. Thus, this sampling distribution is actually a family of distributions and is called Student’s t. To better understand the t distributions, we need to consider a new way of thinking of sample size.
The Degrees of Freedom (df) for a statistic refer to the number of calculations in its computation that are free to vary. For example, the df for the variance of a sample (Sx2) is N-1.
![]()
In other words, since the sum of the deviations equals zero, N-1 of the deviations are free to vary. That is, given N-1 of the deviations, we can easily determine the final deviation because it is not free to vary. In the example below where N=5, the unknown value must be 2.
| c |
|---|
| -2 |
| -1 |
| 0 |
| 1 |
| ? |
| åc=0 |
In the Case II situation, the df for t equals the df for Sx which is N-1. And Student’s t is a family of distributions differing in their kurtosis (or peakedness).
![]()
Note that when df are infinite (i.e., the sample size is very large), the t distribution will equal the z distribution.
As for the formula, remember the z test:
![]()
The formula for the t is similar.
![]()
Like the z test, the critical values of t are obtained from a table. To determine the critical value of t from the table, you will need to know a, the df, and whether you are using a one- or two-tailed test.
You are interested in whether the average IQ for a group of "bad kids" (the ones that put a tack on your seat before you sit down) in a school is different from the rest of the kids in the school. The average IQ for the school as a whole is 102 with the standard deviation unavailable.

If tobs £ -2.093 or tobs ³ 2.093, then reject HO.
If tobs > -2.093 and tobs < 2.093, then do not reject HO.
|
106
|
111
|
120
|
88
|
109
|
|
120
|
123
|
127
|
91
|
130
|
|
118
|
88
|
97
|
110
|
92
|
|
124
|
116
|
118
|
114
|
108
|
Suppose you are a researcher interested in self-destructiveness. You develop a scale to measure this trait. Example questions might include:
Next you obtain a random sample of 25 people and give them the scale. (The difficulty in obtaining a random sample might be noted.) The mean for this sample is 120 and the standard deviation is 10.
One of the things that we may want to know is what is the range of scores expected for the population. If we knew this, we would be easily able to identify the deviant scorer (possibly for a case study).
Let’s say we wanted to know the expected range of scores for 95% of the population. This is termed the 95% Confidence Interval (CI) and is given by:
with df=N-1,
Thus, the current example has df=N-1, alpha of .05 (for 95% CI), two-tailed test. From the t table, we determine that the tcrit is 2.064.
Therefore, the upper limit will be
And the lower limit will be:
We can now be confident that 95% of people would be expected to score between 115.87 and 124.13.