Skip to main content.

Statistics: Tips and Enigmas

Chuck Schultz


Statistics Tip:
Psychologists Concerned about Statistical Inference

"Psychologists will be a much better science when we change the way we analyze data" forms the title of May's lead article in Current Directions in Psychological Science written by Geoffrey Loftus. This is a journal of the American Psychological Society (APS). In it Loftus provides reasons why the null hypothesis significance test (NHST) "is barren as a means of transiting from data to conclusions." He cites many articles published over the past 60 years from such authors as Paul Meehl and Frank Schmidt, to show that repeated caveats and suggestions for improvement have gone largely ignored.

March issues of two news publications of the American Psychological Association (APA), the Monitor and the Science Agenda, carry articles about an APA Task Force on Statistical Inference. The task force has been formed to deal with the NHST and related issues. In addition, the American Psychologist reports that the APA has awarded the Distinguished Scientific Award for the Applications of Psychology to Ward Edwards, who, in addition to his work on decision making, has for over thirty years championed Bayesian statistical inference as an alternative to the NHST.

The statistics courses most of have been required to take base statistical inference on the NHST, and we have learned to live with it. A major criticism of it is that it can mislead us. Let's look at an example in which we evaluate a validity coefficient. With a sample of 50 incumbents, an r of .20 is not significant at the .05 point (using a one-tailed test). Therefore, we are supposed to "fail to reject the null hypothesis." Some people say we accept it but we don't, we fail to reject it.

How do these statements differ? We don't conclude that our validity is .00, but on the other hand, we don't conclude that it isn't either. Our best estimate of the validity, in fact the only estimate we have, is .20. If we hd ten estimates of the validity and they averaged .20, we would probably have rejected the null hypothesis three or four times out of ten. And we would have conflicting results. On the other hand, if we do a meta-analysis , our combined n will be large enough to allow us to reject the null hypothesis and conclude that the correlation is indeed greater than .00.

The task force does not reject NHST, but agrees that it should be used more judiciously. The task force, as well as the various psychologists mentioned above, provides suggestions for improving data analysis. One repeated observation is that journal editors and others evaluating research have encouraged poor statistical practices. The task force tends to prefer the simple analysis to the complex, and to prefer judicious reporting of computer output. Among Loftus's several suggestions is to plot the data. That was, you may recall, one of B.F. Skinner's procedures. He believed that if you need statistics to show an effect, it's not much of an effect.

Statistics Enigma:
Mama, Where Do Selection Specialists Come From?

How many people do you know who long ago said, "When I grow up I'm going to become a Testing Specialist?" You more likely to know someone who said, "What kind of job can I get when my family moves to Springfield?" There is wide diversity, of course, within our field. I, for example, took one of my prelims in Test Construction. Therefore, many Human Resource Specialists think I'm weird; they don't know the meaning of the word diversity. They think it has something to do with quotas.

Nevertheless, people working as testing specialists have reached this pinnacle by many different approaches. As a result, they have a wide range of expertise in and appreciation of test statistics. A typical perspective might be, "Statistics are the essential framework of the selection discipline, although I get along pretty well with an insouciant awareness of their ramifications." Well, perhaps insouciant wouldn't be a word typically chosen, but I didn't want to come right out and say superficial on nonchalant.

In my view, such a statement both over-values and under-values the place of statistics in testing. Far from being the essential framework for testing, statistics are just a tool. Not as valuable a tool as job- analysis, nevertheless, quite valuable. On the other hand, the insouciant approach to statistics nullifies their usefulness.

Let's ponder the effect of over- and under-valuing the tool. A test writer looks at the results from an item analysis program. The reliability coefficient seems to be in the right range so that is taken to mean the test is OK. The insouciant statistician doesn't ask what conditions this method of computing reliability assumes and whether the assumptions are met. (For one thing, more methods consider certain sources of error and ignore others. Does this method ignore the right ones?) Over-valuing the procedure, the test- writer relies on it. Under-valuing the procedure, the test writer doesn't bother to notice what it means.

To carry the scenario a bit further, perhaps the reliability (whatever it means) is only .55. Reliabilities that size are customarily frowned upon. Therefore, the test writer looks at the discrimination indices (rpbs perhaps, whatever they are). Usually rpbs less than .20 are considered awfully small. So the test writer throws our the items with small rpbs and, as a result, test rid of the most valid test items. That's possible and not altogether unlikely.

So what do I prescribe? Don't use a statistic unless you know what it means. To find out what it means, ask someone who will respect you and your need to know. Corroborate any decisions you make based on statistics with other information you have about the situation. If there is a discrepancy, use the information you understand, which may not be the statistics.

Taking a basic statistics course may or may not help. Most people who misinterpret statistics have had a basic statistics course. You may get further by seeing how the statistics on your test relate to other facts you have about the test. And may you can get a video on statistics that Ted Darany made for MAPAC.

Chuck Schultz may be reached at (360) 923-5340, 2941 B Firwood Loop, Olympia, WA 98501-4844.


© Copyright 1997 by the IPMA Assessment Council. All rights reserved.