Statistics: Tips and Enigmas
Chuck Schultz
Statistics Tip:
The Elusive Delineation of Concepts
One needs discernment to make sense out of research in our field, although we don't always take enough time to reflect. Drawing conclusions from human behavior requires sensitivity to nuances and receptivity to unexplored possibilities. How to quantify observations so that they realistically represent critical conditions, processes, and outcomes is the great challenge in psychological science. At best, we can recognize the shortcomings of our measures as we draw inferences from our data.
In the effort to assess job performance, investigators come back to supervisors' ratings although the weakness in ratings are well known. A direct measure of performance ought to be better. I used amount of money collected as a measure of performance for revenue officers and for support enforcement officers. However, that measure had problems, too. The best performers take time out to assist their clients and colleagues and the most prolific collectors sometimes use unacceptable means. Although we know the flaws in supervisors' judgments, they nevertheless alert us to deficiencies in our other criteria.
While measuring only a few aspects of success, longevity seems to be an objective criterion measure. A worker who is still in the job five years after having been hired must be doing something right. In addition, that person has not added to turnover costs. However, perhaps five more competent people, working a year each, would have added more value to the organization. And what of the people who are no longer on the job because they have promoted to expert or supervisor?
Researchers continually measure college success by grade point average (GPA). However, we know that factors other than what is learned in college affect grades. Courses like math, biology, and physics give the lowest grades while requiring the greatest application. A person well versed in a subject can get an A in a class without attending, while a person who knows nothing about the subject needs to learn a lot to get a C. Some students owe high grades to paying close attention to an instructor's grading quirks. Cramming for tests yields grades, but little lasting effects. In short, some 2.00 students gain more from college than some who make the dean's list. While a positive correlation no doubt exists between GPA and the amount one profits from college, one may want to round out the picture with subjective judgments.
What do we do when we have an important psychological phenomenon and a less-than-perfect measure of it? Rather than deploring the findings because the measure of the concept is imprecise, we need to use our judgments to figure out what to make of them.
Statistics Enigma:
Why Weight?
In multiple-choice test questions, some alternatives are more wrong than others. Well, at least they are more likely to be chosen by people with the least knowledge of the subject. For example, you can rank the choices to the question, "What is the capital of Kentucky: Lexington, Frankfort, Springfield, or San Francisco?" On a public relations item, all the alternatives may be OK things to do, but some will produce better results.
Throughout most of this century, psychometricians have contemplated the efficacy of differentially weighting alternatives to multiple-choice test questions. Numerous articles on the subject have been published on this topic and the related topic of differential weighting questions. Weighting options can raise the reliability by a statistically significant amount but test validity by maybe a point or two. With an n less than 25,000 it is difficult to show any significant change in validity. The conventional wisdom is that weighting alternatives is not worth the trouble. And who of us would challenge conventional wisdom?
Why would anyone care about improving a validity coefficient by .01? How about because it pays dividends? If you are selecting from the top ten percent of a test score range, an increase of .01 in the test validity coefficient will increase productivity by 0.0175 standard deviations. Big deal? Well, what if the standard deviation of annual productivity is $10,000 (which is not at all uncommon)? You would gain $175 per candidate per year. If your are going to hire on candidate with your test, it is probably not worth the effort to come up with weights for alternatives. If you hire a thousand candidates, you might gain $175,000.
In most of the reported research, the differences observed have been so small that the investigators wondered if they were random error. Theoretically, there is a modicum of value in those weights. Weighting appears to add reliable variance. If that variances is job related, and if it can be efficiently captured, consider capturing it. When there is more than one right way to do something, subject-matter specialists yearn to give more credit for a halfway decent response, than for a destructive response. They may make you an extra few bucks per candidate, which you will never be able to demonstrate.
Chuck Schultz may be reached at (360) 923-5340, 2941 B Firwood Loop SE, Olympia, WA 98501-4844.
© Copyright 1997 by the IPMA Assessment Council. All rights reserved.
