Correlation for ranked scores: Spearman Rank-order Coefficient (rs)

If one or both of the factor values are ranked, you must use Spearman rs. If one of the factors is continuous, but not ranked (e.g., scores on an exam), you will need to transform it to ranks before calculating rs.

Example: Physical Education teachers are interested in the relationship between body weight and fitness in 10 year-olds. They have BMI information (Body Mass Index - a ratio of height to weight) for 10 girls. They have them run a race, keeping track of the finish order (e.g., first, second, etc.). Here are the results:

 Subject BMI Finish order Juanita 17.2 3 Susan 17.5 1 Chin 17.8 2 Kimberly 18.0 4 Cynthia 19.2 6 Celeste 19.3 5 Audrey 20.0 10 Mee 21.0 8 Fatima 21.4 7 Nicole 25.1 9

The BMI data are normally-distributed. However, the race outcome is not. The results are ranked data. Also, we can't make any assumptions about the amount of time between each finish place (e.g., 1st vs. 2nd, 2nd to 3rd, etc.). In this situation we use the Spearman (rs) rather than the Pearson (r) formula.

Before calculating rs, the BMI data must be changed to ranks. In this example we rank the heaviest child as 1. Our prediction is that the heavier the child, the slower the speed (an inverse or negative correlation). Looking at the raw data gives you some idea of the outcome. Nicole is the heaviest, and she came in ninth in the race. The lightest runner, Juanita finished third.

 Subject BMI rank Finish order Juanita 10 3 Susan 9 1 Chin 8 2 Kimberly 7 4 Cynthia 6 6 Celeste 5 5 Audrey 4 10 Mee 3 8 Fatima 2 7 Nicole 1 9

Here is the formula in case you need it -- no need to memorize it.

Results from the above data:

 For correlation, the null hypothesis is that r = 0 -- that there is no relationship between the variables. The correlation coefficient is -.867. The negative (-) coefficient indicates that the relationship is an inverse one. The p value is less than .05, we can reject the null hypothesis. Conclusion: The heavier the student, the less fit (at least with regard to running).

Hand calculation example

Tied ranks

If there are ties (e.g., 2 or more cases of the same rank), add the ranks they would use, and give each of the tied cases the mean. For example, 2 cases in 3rd place would take up ranks 3 and 4. Assign each a rank of 3.5 = (3+4)/2. The next case is ranked 5 (because the places for 3 and 4 have been taken). If there are 3 cases tied for 5th place, they will cover ranks 5, 6, and 7. Each is assigned a rank of 6 (the mean of 5, 6, and 7). The next rank will be 8. Be sure you end up with neither more nor fewer ranks than equals the number of paired scores (the last rank may not be exact if there are cases tied for it).

When you have understood this module along with Pearson r take the self-tests

Self-test #1: Scatterplots
Self test #2: Estimating direction and strength of correlation (this is challenging)

Next section: Effect size