Part II: Statistical Tests for Nominal Data
Binomial and Chi Square Distributions

 
 
Binomials are utilized when our measure y is dichotomous (2 outcomes) and both outcomes are mutually exclusive and exhaustive. Therefore, the sum of the probabilities for the events equals 1 [p(event 1) + p(event 2) = 1.00]. This can then be extended to situations where this is done several times.

For example, we have three coins. For each coin, the probability of obtaining a head is p=.5, and the probability of obtaining a tail is q=.5 (where q = 1-p). However, when all three are flipped, there are four basic outcomes: 3 heads, 2 heads, 1 head, and 0 heads. The probability of 3 heads is simply, p(3 heads) = (.50)(.50)(.50) = .125. Following similar logic, the probability of 3 tails equals q(0 heads) = (.50)(.50)(.50) = .125. The probability of obtaining 2 heads equals p(2 heads) = p(2 heads and a tail) = ppq = .125. However throwing 2 heads and 1 tail can happen three ways: HHT, HTH, and THH; therefore, p(2 heads and a tail) = (3) p(2 heads and a tail) = .375. The same logic follows for figuring the probability of 1 head and 2 tails such that p(1 head and 2 tails) = .375. Overall, the sum of the probabilties of each event is equal to 1 (Total p for all 4 outcomes = .125 + .125 +.375 + .375 = 1.00).

Picture (247x185, 2.6Kb)We can therefore derive a general form of the binomial: (p + q)n where n = the number of trials. If we are interested in asking the questions, "What is the probability of r successes given n and p?" we can use counting rule #5 (combinations) in conjunctions with the binomial to arrive at n pr qn-r. Thus, if we wanted to find the probability of 2 heads and 1 tail we could do the following:

[3! / (3-2)!(2)!] (.52) (.51) = .375

You will note that the binomial distribution is the exact sampling distribution for dichotomous situations.

However, as the sample size gets infinitely large, the binomial distribution approaches a normal curve. Thus, when np > 5 and nq > 5, we can use z scores and the normal curve to approximate the binomial. However, one consideration must be made. Since the binominal is a discrete distribution and the normal curve is a continuous distribution, we must correct for continuity by using Yates' Correction. Below is the formula for the normal approximation of the binomial:

Picture (179x69, 1.6Kb)

 
 
Recall that the binomial distribution can be used for only one variable with mutually exclusive and exhaustive, dichotomous outcomes [Y=1,2]. What happens, however, if the possible outcomes of y are still categorical but have more than two classes? That is, suupose we have Y=1,2,3,...k, what do we do?

Just as the binomial is the exact sampling distribution for the Y=1,2 situation, the multinomial is the exact sampling distribution for the Y=1,2,3,...k case. And, just as we can use a normal curve to approximate the binomial, the chi square sampling distribution (along with the statistic you can calculate with your data) will approximate the multinomial.

Picture (201x137, 2.6Kb)Chi suare is a family of distributions that vary with degrees of freedom (a concept that will be discussed at length later on). But in general, as the degrees of freedom become infinitely large, chi square approaches normality. Note: the expected value of each chi square distribution (mean) is equal to the number of degrees of freedom for that curve.

A small technical distinction: Chi Square is a continuous, theoretical sampling distribution. The sampling distribution of your statistic is discrete. Chi square approximates the discrete statistic just as the normal curve approximates the discrete binomial.