Part I: Probability & Hypothesis Testing
Distributions and Hypotheses

 
 
A sampling distribution can be defined as a theoretical probability distribution of all possible values of a sample statistic which would occur if all possible samples of size N were drawn from the population. In other words, we are looking for the sample space--the set of all possible outcomes--and the probabilities of each of these outcomes.

Picture (292x192, 3.1Kb)Sometimes we can exactly specify the sampling distribution. For example, if we toss 4 fair coins, we can determine the sample space (Counting Rule #1: S = 16) and the probability of each event class within the sample space [p(0H) = 1/16; p(1H) = 4/16; p(2H) = 6/16; p(3H) = 4/16; p(4H) = 1/16)]. This is known as a discrete sampling distribution and can be represented graphically as a probability distribution:

Usually, however, the sample space can not be so accurately specified because it is theortically infinite. In this case, we have a continuous curve which approximately represents the true sampling distribution. In many cases, variables tend to be continuous and distributed normally.

Picture (300x140, 2.1Kb)In order to describe any sampling distribution which is represented by a continuous curve, we need three pieces of inofrmation: the mean, the sample, size, and the standard deviation. The standard deviation of the sampling distribution is known as the standard error of the smapling distribution. If a distribution is one of means, then this value is known as the standard error of the means. The sampling distribution can then be used in order to make determinations about the probability of an expected outcome, even in cases where the exact sampling distribution is not known.

 
 
A hypothesis is a statement regarding the predicted outcome of a particular experiment. Hypothesis testing, however, focuses on the null hypothesis (Ho) . The null hypothesis is the testable prediction that most often states that no true difference exists between groups or between a sample and the population. In other words, the null hypothesis is that opposite of what you really expect to find. This is contrasted with the alternative hypothesis (H1), the statement of what you truly expect or wish to find.

In general, the test of a hypothesis is a question of probability. More specifically, it is a question of conditional probability. The question is: What is the probability of my results given that the null hypothesis is true? If the probability is very low, we can then question whether or not null really is true; that is we reject the null hypothesis. If the probability is high, we do not question the truth of the null; that is, we fail to reject the null hypothesis.

Two types of testable hypotheses exist:

  1. Goodness of Fit: Here the question of interest is: Does the sample come from the general population? Does what I know about the sample population fit with what I know about the general population? Most often, the null hypothesis will state that the mean of the sample population equals the mean of the general population. This question can be answered by drawing a single sample, computing the appropriate sample statistic, and comparing that to the same parameter in the general population. Note that there is only one variable under consideration (y) and that the population parameter must be known. If we were interested in whether people in a certain town were of above average intelligence, then we could compare our findings on the basis of the known average for intelligence.
  2. Tests of Independence: In this case, the question becomes: Are two variables--the independent variable (x or y1) and the dependent variable (y or y2)--related or are they independent? Does the knowledge of a person's score on one variable give you any information about his or her score on another variable? This question is answered by drawing samples from two populations (e.g. two towns) and comparing them on intelligence. Now we are concerned with two variables and our null hypothesis states that the population means do not differ significantly.
 
 
When assessing a given finding for significance, we are concerned with testing a null hypothesis. In general, the test of a null hypothesis is a question of probability. More specifically, it is a question of conditional probability. The question is: What is the probability of my results given that the null hypothesis is true? In order to answer this question, one should follow seven basic steps:
  1. Specify the Independent and Dependent Variables along with Their Respective Levels of Measurement. Characteristics of the independent and dependent variables determine the statistical test and sampling distribution that will be used in order to assess significance.
  2. State the Null and Alternative Hypotheses and Indicate Whether the Test is One-tailed or Two-tailed. Remember that the null hypothesis is the outcome that you would expect simply by chance alone. The alternative (or experimental) hypothesisis states the result that you would expect if your independent variable had an effect. A one-tailed alternative hypothesis indicates that you would expect one group to score higher than another, whereas a two-tailed alternative hypothesis does not make a prediction as to which group will score higher. For example, you could state direction by claiming that one group should have higher scores than another.
  3. Set Alpha, Indicate the Number of Individuals in the Study, and the Number of Degrees of Freedom (if relevant). Alpha is the cutoff value you will use to decide if the probability of your results is high or low. In other words, alpha is the probability [p(outcome/H0 is true)] which is low enough for you to stop believing that the null hypothesis is true. The most commonly used alpha is .05. Degrees of freedom are used to help create an unbiased estimate of the population parameters necessary for the sampling distribution.
  4. Calculate the Appropriate Statistical Test. The test chosen provides you with an index of departure. In other words, the calculated statistic tells you how far your sample differs what is stated or predicted in the null hypothesis.
  5. Determine the Critical Value from Your Sampling Distribution. This value reflects the value of the index of departure that would be necessary to have a probability of alpha. Alternatively, by using a sampling distribution, you can find out the theoretical probability of getting your departure value or one more extreme; remember, this probability is based on the assumption that the null hypothesis is actually true. Both of these procedures provide you with a probability which you will next compare to alpha.
  6. Compare the Calculated Value to the Critical Value. For this step, indicate whether the calculated value is greater or less than the critical. If the calculated value is larger, then it has a probability lower than alpha. In this case, you reject the null hypothesis and doubt whether it is really true. If the calculated value is smaller than the critical value, then the probability of getting the calculated value when the null hypothesis is true is high. In this case, you fail to reject the null hypothesis (note that this is different from accepting the null hypothesis!).
  7. State Your Conclusion in Words. Lastly, you should go beyond the mathematics to answer the question that was posed. For example, if purpose of the study was to test to see if two groups scored differently, then step seven should state whether or not the two groups differed significantly on the dependent variable.