Criterion Complexity: For What Attributes Do We Test?
Chuck Schultz
To determine whether a selection test is any good or not, you need some criteria to judge it by. Test developers carefully craft tests to measure attributes defined in comprehensive job analyses. (Well, when we have the time and resources we do.) Crafting a test to measure specific criteria is arguably the best way to ensure its validity. For years, the profession preferred correlating test scores with scores on a criterion measure. Correlation with any old criterion was preferred to content validation, one form of which is crafting a test to measure specific criteria.
That some criteria are questionable is referred to as "he criterion problem" One way to solve the criterion problem is to craft the criterion measure to assess the attributes defined in a comprehensive job analysis. Criterion measures are rarely crafted with the same diligence as the tests for which they serve as the standards. Therefore, the revered criterion-related validity procedure often adds a shaky step, between the test and the criteria against which it is judged.
Without impetus of job analysis, self-evident indicators of proficiency or success often serve as criteria for test validation. Would you have second thought about evaluating tests for the following roles against the suggested criteria?
| Toll collector | Accuracy of making change |
| Tax collector | Amount of delinquent taxes collected |
| Sales agent | Amount of total sales |
| College student | Grade point average (GPA) |
I am liberal enough to agree that these might be used without a comprehensive job analysis. But I suggest that you consider the situation carefully to ensure that these criteria truly represent the performance in which you are interested.
Twenty-five years ago, the Port Authority of NY and NJ used accuracy of making change as a selection factor for toll collectors and had traffic backed up for miles. Speed of making change was important; the few small errors in making change were unnoticed or easily corrected.
I got a lousy validity coefficient using amount of delinquent taxes collected as a criterion. After the fact, those who suggested the criterion measure pointed out that the really good revenue officers had good rapport with the taxpayers and so had few delinquencies to collect.
Factors other than a sales agent's performance may be important to total sales. Do sales depend largely on the territory, the product, or the history of the buyers? May achieving top sales have negative consequences, such as excessive pressure on clients, stress on the delivery or production mechanism, or lack of teamwork with other sales agents?
What criterion has been used in more validity studies than GPA? And while GPA has considerable relevance as a measure of college success, it doesn't tell the whole story and for some purposes, it misses the mark entirely.
George had a 4.0 in high school, spent his college years drinking and goofing off and still got a 3.5 in college then went into the family firm. Herman had a 2.5 in high school, worked diligently to get a 2.5 in college, was stimulated to new insights, and emerged from college with new aspirations and skills he had never contemplated. Who was more successful in college, the person with the 3.5 or the one with the 2.5? The answer depends on your purpose in defining successful. Some colleges want top achievers; some concentrate on helping people improve.
Another ingredient in the GPA potpourri is the student's approach to influencing grades independently of achievement. Some students concentrate on learning without regard to grades. Others concentrate on the instructor's grading style and on past tests. And then there was the student who told me, after taking the mid-term, that she would have to drop my course to protect her 4-point.
Supervisors' ratings need their rotten reputation; it helps the user be wary. But often supervisors' ratings are the best criterion you can get and sometimes they are excellent.
Ratings based on a job analysis covering specific instances and using trained raters occasionally work really well. And occasionally with one supervisor, who is intimately familiar with a large group of workers, you can just ask who's good and who isn't and get a meaningful ranking.
Knowing which workers are the most proficient or the most productive may not be the criterion you need. What if you get an excellent worker who stays six months and moves on? If turnover is not too costly and the excellent worker does twice as much work as one who will stay twenty years, take the good ones for six months a shot. If recruiting and training costs are high, those affect the utility of the selection. Other tangible factors such as attendance, rapport with co-workers, and adaptability will, for some jobs, be important criteria.
The trick is to look at the whole system of hiring and utilizing the worker. Look at all the aspects of the hiring process and see how they relate to producing a worker who will be most effective as used in the setting. Perhaps using multiple criteria and hiring workers high on different criteria will provide optimal diversity.
Chuck Schultz may be reached at (360) 923-5340, 2941 B Firwood Loop SE, Olympia, WA 98501-4844.
© Copyright 1998 by the IPMA Assessment Council. All rights reserved.
