Skip to main content.

Statistics: Tips and Enigmas

by Chuck Schultz


Statistics Tip: Substituting one test for another

Do you ever want to use one test in place of another? Do you need a test that takes less time, is less familiar to the candidates, or has less adverse impact? (The Guidelines mention a test with equal validity and having less adverse impact.) How do we know that a test will make an adequate substitute?

We may have two tests that correlate highly with one another and assume they measure the same thing. However, high correlation by itself provides weak evidence for substitutability. Two highly correlated independent measure may have widely different correlations with our criterion. For example, two test can correlate .90 and potentially have validities as different as .44 and .00. However the validities can be that different only if both tests are perfectly reliable. When two test correlate .90, the unshared variance can often be attributed to random error.

Test scores may be heavily influenced by a variable that is not pertinent to your purpose. Tests using a common method show higher intercorrelations than do different kinds of tests. Method variance always exists, even when it hasn’t been demonstrated with a multi-trait multi-method matrix. Researchers commonly cite general intelligence, g, as a cause of high correlation between tests. For a criterion unrelated to g, that irrelevant variable could produce a misleading correlation.

Let’s permute the set of correlations given in paragraph two, above. A test with .00 correlation with another test may be a good substitute. If the original test is correlated .44 with the criterion, the uncorrelated substitute could correlate as high as .90. That substitute wouldn’t fill the same niche in a test battery, but it might be a useful alternate selection device. Again, that extreme difference in validities could only occur if both tests are perfectly reliable.

You sometimes want to substitute non-independent measures as when you substitute a shortened test for the original version. If you pick 20 items from an 80-item test, you will find a strong correlation between the short and the long tests. In this case, you are not dealing with an ordinary correlation, but with a part-whole correlation. This yields a spurious value, since one measure makes up a good portion of the other.

Even random error will have a high correlation with a total score it is part of. For example, in a test with a reliability of .75, errors of measurement correlate .50 with the observed score. With reliability as low as .50, errors of measurement correlate .71 with the observed score. The correlation between the part and the whole gives no assurance that shortened test maintains the validity of the original. However, item analysis data may be of some use for that purpose.

So correlation between two tests does not show one test will be an adequate substitute for another. Correlation with criterion, on the other hand helps substantiate substitutability. But you still need to show the alternate to be an equally good measure of those aspects of the criterion you need to measure. Showing that the two tests measure equally well the same constructs or content domain assures you that you have a good substitute.


Statistics Enigma: The improvement was regression toward the mean

A group of children took a reading test. Those who scored more than one standard deviation below the mean were put into a special reading program. Students in the reading program took the reading test again at the end of the semester and, on the average, the scores were higher. A matched t-test showed a significant difference between the means.

The reading teacher, George Bernard Phonics, claimed that this showed that the program did some good. Is that true? You know, I have to say: “No, the procedure described does not show the program did any good whatsoever.” The significance test was not based on a random sample. Comparing the scores on the second test to predicted true scores from the first test would show whether the group had made gains.

George “learned” about regression toward the mean in his first statistics course, but forgot about it right after the final exam. His professor was a bit flaky and talked about all kinds of irrelevant stuff. Among that stuff was that if you take people from one end of a distribution and see where they are on another distribution, the second average will always be closer to the mean than the first.

My friend, Julie, got a good score on a promotional test, but decided it wouldn’t hurt to take the test again. All she could do is raise her score, right? To her surprise her score went down one point.

It was still a high score, but the one point drop was worth complaining about. Rather than offering solace, I blurted, “Oh yeah. The regression effect.” Scores don’t mean what they say; the mean what they say, plus or minus some inaccuracy.

You can predict a score for any given person on a retest will be closer to the mean than the original score - that is, if the test-taker hasn’t changed in respect to the attribute measured. Does that mean that if a sample of people retake a test the standard deviation will be smaller? No. People just trade errors of measurement.

People who had scores above the mean on the original test more likely had positive errors of measurement. Of those who had positive errors, some will have a larger positive error on another test, some a smaller positive error, but some people’s errors will change from positive to negative. Their average error of measurement will be zero. Since the average error is lower, their average observed score is lower.

Analogously, those who originally had negative errors will on the average have less negative errors. For everybody taken together, the average error will be zero. These new errors added to the same distribution of true score yields the same overall distribution.

Editor's Note: you can reach Chuck at (360) 923-5340, 29413 Firwood Loop, Olympia, WA 98501. For those of you who think that is a mistaken misprint of one of Chuck's old addresses, it's not. It is his new address, he has just moved back to the exact same spot. Chuck says this address should be good for a year, maybe even two. . .


© Copyright 1996 by the IPMA Assessment Council. All rights reserved.