Technical Affairs
Mike Aamodt, Associate Editor
This month's column contains an answer to an ACN reader's question about standardizing scores and two pieces of HR humor.
Question: Our police department has an interview panel consisting of three members. Each member rates applicants on a 1-7 scale. We then add the scores from the three interviewers, and the applicant getting the highest score is the one we hire. I think they should standardize each rater's scores and then add them, but the police department says it is too much trouble and won't make any difference anyway. Would it matter?
Answer: Theoretically, it does matter. Although each rater is using the same seven-point scale, each rater potentially has very different mean ratings and standard deviations. From a practical standpoint, it only matters if there are actually differences in how each interviewer uses the rating scale. To illustrate this, let me provide you with an anecdotal answer and then use some hypothetical data.
When I was a test administrator for a civil service commission in a small town (so small that the flavor of the month at the local Baskin-Robbins was always vanilla), I noticed that although there was good reliability between most of the interviewers on a police and fire selection panel, there were tremendous differences in the way the interviewers rated applicants. That is, several of the Commission members rated everyone low, several were very lenient, and one person used the entire five point range. When I asked this particular person about his ratings, he responded that by using all five points, he had more power than the Commission members using only two or three points on the scale.
A year later, I took a graduate class and understood the psychometrics behind this member's intuition. Though all members were using the same five point scale, each member had a different mean rating and a different standard deviation. Members with higher standard deviations had more influence than those with lower standard deviations because the difference between a high and low rating for them could be four points whereas it might only be one or two points for their less variable counterparts.
Interestingly, I saw this same pattern of "rating influence" when I watched Star Search and Dance Fever on TV. On Star Search (you know, with Ed McMahon), each rater could give an act between one and four stars. Most judges wanted to be nice so they gave a poor performance three stars and a good performance four stars. However, there were judges who gave the first act two stars and then had the option of giving a worse act one star or a better act three or four stars. Again, these "variable judges" had more power than their lenient counterparts.
Why? Suppose that a Star Search judge thought Happy Shoes Waldron was the best dancing act and gave her four stars. The judge gave Twinkle Toes Masden three stars. The other judge disagreed and thought Masden was the better dancer so he gave Masden four stars and Waldron two stars. If this were a vote, we would have a tie and the audience would have to decide. However, Masden would win with a total of seven stars to Waldron's six stars. As long as our second judge uses more of the rating scale than the first judge, his opinion will always count more. By the way, these same type of rating behaviors occurred on Dance Fever (with host Danny Terrio - the guy who taught John Travolta his moves in Saturday Night Fever) with the only difference being that the rating scale was between 0 and 100.
Let's use the data below to demonstrate statistically what the magic of Hollywood could only demonstrate anecdotally. Suppose that you have three members on an interview panel: Tinkers, Evers, and Chance. As you can see from the table below, Tinkers is a lenient rater, Evers is a strict rater, and Chance is a variable rater. If we simply sum their raw ratings, we would hire Livingston. If we converted their ratings to standard scores based on the mean and standard deviation for each rater, we would hire Patterson. [As an example of this standardization process, the standard score for Tinkers' rating of Daly would be the raw rating minus Tinkers' mean rating divided by Tinkers' standard deviation. In this example, it would be (5 - 5.88) / .83 = - 1.06.]
In this hypothetical example, standardizing scores would make a difference and would reduce the power of Chance (our variable rater). To go back to the question of should one standardize ratings, the answer is probably yes. If our goal is to ensure that each rater has equal weight, we must standardize. However, if for some reason we want to give more weight to raters who use the entire rating scale than raters with restricted ranges, using raw scores would accomplish our goal.
By the way, we found that a good way to demonstrate the importance of standardizing scores is to create a table, similar to the one below, depicting the differences in the outcomes of using raw scores versus using standardized scores.
Raw Score Listing
| Interviewer | ||||
| Applicant | Tinkers | Evers | Chance | Raw Score Total |
| Daly | 5 | 2 | 1 | 8 |
| Johnson | 7 | 2 | 5 | 14 |
| Livingston | 6 | 3 | 7 | 16 |
| Patterson | 7 | 4 | 4 | 15 |
| Reichert | 6 | 4 | 4 | 14 |
| Smith | 5 | 2 | 3 | 10 |
| Soileau | 6 | 3 | 3 | 12 |
| Tozer | 5 | 2 | 2 | 9 |
| Mean | 5.88 | 2.75 | 3.63 | |
| SD | .83 | .87 | 1.85 | |
Standard Score Listing
| Interviewer | ||||
| Applicant | Tinkers | Evers | Chance | Raw Score Total |
| Daly | -1.06 | -.85 | -1.42 | -3.33 |
| Johnson | 1.34 | -.85 | .74 | 1.23 |
| Livingston | .14 | .28 | 1.82 | 2.24 |
| Patterson | 1.35 | 1.41 | .20 | 2.96 |
| Reichert | .14 | 1.41 | .20 | 1.75 |
| Smith | -1.06 | -.85 | -.34 | -2.25 |
| Soileau | .14 | .28 | .34 | .08 |
| Tozer | -1.06 | -.85 | -.88 | -2.79 |
HR Humor
Below are ten Murphy's Laws of the Recruitment Jungle that were developed by CORS - an organization specializing in recruitment.
- When an applicant's education and experience match your job description, their salary will be out of line
- A person is never as good as their resume
- Your second choice will look better as you realize your first choice won't accept your offer
- All hires were needed yesterday but some were actually needed the day before
- If more than one person is involved in the hiring decision, no one will be at fault if the person doesn't work out
- Most poor performers surface the day after their probation period is over
- Filling a position takes three times longer than you expected and six times longer than you have
- The more words in a resume, the less quality in the person
- Nothing is impossible for the person who doesn't have to do it
- Self-starters - will not
The following piece of HR humor was forwarded to me by an ACN reader who found it on the internet.
Reaching the end of a job interview, the Human Resource Director asked the young Engineer fresh out of MIT, "And what starting salary were you looking for?" The Engineer replied, "In the neighborhood of $75,000 a year, depending on the benefit package."
The HR Director said, "Well, what would you say to a package of 5-weeks vacation, 14 paid holidays, full medical and dental, company matching retirement fund to 50% of salary, and a company car leased every two years - say, a red Corvette?"
The Engineer sat-up straight and said "Wow!!! Are you kidding?" To which the HR Director replied, "Certainly, ... but you started it."
Mike Aamodt is a Professor at Radford University In addition to answering technical questions he has taken over as the unofficial humor editor. If you have a technical question you want answered or a humor submission please send it to Mike by email (maamodt@runet.edu), phone [(540) 831-5513)] or fax [(540) 831-6113].
© Copyright 1997 by the IPMA Assessment Council. All rights reserved.
