For many categorical outcomes, the
appropriate statistic to use is Chi-square (
).
It is easy to calculate by hand from a contingency table. All statistical software
packages include Chi-square (this is not a misprint -- it is not Chi-squared,
even though it looks that way).
Using a contingency table (see Descriptive statistics, categorical outcomes), the Chi-square statistic is based on the difference between the observed frequencies (the raw data - the counts in the cells) and the frequencies expected by chance (based on the marginal totals).
Example: The hypothesis is that lower division students are more likely to prefer the quarter system, and upper division students prefer semesters. Below is the contingency table. Do NOT use percentages in the contingency table calculations.
Rationale for calculating expected frequencies (to give you a sense of the underlying statistics, you do not need to memorize it): 1. There are 14 lower division students. 2. There are 26 participants in the survey. 3. Of the 26 participants, 15 students prefer the quarter system. 4. Based on the marginal totals, 14/26 (proportion of lower division students in the sample) of 15 (total number preferring the quarter system) = the number of lower division students would be expected to prefer the quarter system. 5. From this reasoning, the Expected frequency (E) = (14/26)*15 = 8.08. 6. The Observed frequency (O) = 11
| Chi-square formula | |||
| O = Observed frequency E = Expected frequency |
See hand calculation. | ||
Interpreting the Chi-square statistic
For the survey results,
= 5.41.
This Chi-square value is our inferential statistic, but it is not our final
goal. It is a step along the way. What we are after is its p value - p is a probability estimate. The p value
is what will determine whether or not we accept our hypothesis that the two
groups of students differ in term preference (review the
Introduction for an explanation of p).
![]() |
Here is the way the statistical result is displayed in a report (without the colorful, informative labels). See hand calculation for instructions about df and finding the p value on a table. |
Accepting or rejecting the null hypothesisAs noted previously, the general rule in the behavioral sciences is to reject the null hypothesis if the probability associated with the appropriate statistic is less than .05. In this example, p is less than .001 (which is considerably less than .05). Therefore, we will reject the null hypothesis and accept our research hypothesis that there is a difference in term preference between the two groups.
From the descriptive statistics,we see that the lower division students prefer the quarter (64%) with more of the upper division (53%) students preferring the semester. The inferential statistic tells us the difference is greater than expected by chance.
Class level (predictor variable) Preferred term (outcome) Lower division Upper division Quarter 78.6% 33.3% Semester 21.4% 66.7%
Review: Steps in hypothesis-testing
![]() |
|
Next module: Selecting statistics (short)