Inferential statistics: Outcome = categorical levels (Chi-square)

For many categorical outcomes, the appropriate statistic to use is Chi-square (). It is easy to calculate by hand from a contingency table. All statistical software packages include Chi-square (this is not a misprint -- it is not Chi-squared, even though it looks that way).

Using a contingency table (see Descriptive statistics, categorical outcomes), the Chi-square statistic is based on the difference between the observed frequencies (the raw data - the counts in the cells) and the frequencies expected by chance (based on the marginal totals).

Example: The hypothesis is that lower division students are more likely to prefer the quarter system, and upper division students prefer semesters. Below is the contingency table. Do NOT use percentages in the contingency table calculations.

Rationale for calculating expected frequencies (to give you a sense of the underlying statistics, you do not need to memorize it):
1. There are 14 lower division students.
2. There are 26 participants in the survey.
3. Of the 26 participants, 15 students prefer the quarter system.
4. Based on the marginal totals, 14/26 (proportion of lower division students in the sample) of 15 (total number preferring the quarter system) = the number of lower division students would be expected to prefer the quarter system.
5. From this reasoning, the Expected frequency (E) = (14/26)*15 = 8.08.
6. The Observed frequency (O) = 11
Chi-square formula chi-square formula  
    O = Observed frequency
E = Expected frequency
upper case sigma = Sum of above across all cells
See hand calculation.

Interpreting the Chi-square statistic

For the survey results, = 5.41. This Chi-square value is our inferential statistic, but it is not our final goal. It is a step along the way. What we are after is its p value - p is a probability estimate. The p value is what will determine whether or not we accept our hypothesis that the two groups of students differ in term preference (review the Introduction for an explanation of p).

  labeled statistical output for chi-square

Here is the way the statistical result is displayed in a report (without the colorful, informative labels).

See hand calculation for instructions about df and finding the p value on a table.


Accepting or rejecting the null hypothesis

As noted previously, the general rule in the behavioral sciences is to reject the null hypothesis if the probability associated with the appropriate statistic is less than .05. In this example, p is less than .001 (which is considerably less than .05). Therefore, we will reject the null hypothesis and accept our research hypothesis that there is a difference in term preference between the two groups.

From the descriptive statistics,we see that the lower division students prefer the quarter (64%) with more of the upper division (53%) students preferring the semester. The inferential statistic tells us the difference is greater than expected by chance.
    Class level (predictor variable)
Preferred term (outcome) Lower division Upper division
  Quarter 78.6% 33.3%
  Semester 21.4% 66.7%

 

 

 

 

 

Review: Steps in hypothesis-testing

  1. Calculate descriptive statistics (Percentages, based on contingency table)
  2. Calculate an inferential statistic (Chi-square)
  3. Find its probability (p value)
  4. Based on p value, accept or reject H0
  5. Draw conclusion

Self-test #5: Calculating Chi-square

Next module: Selecting statistics (short)