Skip to main content.

Statistics: Tips and Enigmas

by Chuck Schultz


Statistics Tip:
Setting the Standard

Standard scores, or z scores, give us a handy way of picturing a score on a test in reference to the normal curve. Converting a fairly normal distribution to z scores allows you to immediately picture where a given score falls. Convert any score simply by subtracting the mean and dividing by the standard deviation:

Z score formula

In addition, z scores make it easy to compare scores on two distributions when those distributions allow comparison. Such as: two parallel forms of a test given to two random samples of the same population. Or tests of two different traits given to the same group of people, for whom you can assume that both traits are thoroughly distributed. What does thoroughly distributed mean? Well, that's a judgement call, or as we say in the trade, an assumption. I would suggest: language aptitude and vocabulary.

If a random sample of Americans took an English vocabulary test and a French vocabulary test, you couldn't expect the results to be comparable. Many of us would know savoir faire, de rigueur, and demi tasse, although we may not be able to pronounce them. The average American may know a few dozen French words, while a few Americans would know several thousand. It would take a strange test to yield a symmetrical distribution.

If you had either SAT or GRE scores but not both for a number of people you might want to convert each of them to z scores and see where people fall within either group. Notice that the two tests were not constructed to measure precisely the same thing. You would be more concerned about the difference between a sample taking a college entrance exam and one taking a graduate school entrance exam. Difference in these groups would preclude a definite comparison.

You probably wouldn't want to compare proficiency z scores from a pretest and a posttest. An intervening treatment, such as a training session, would probably affect the mean and shape of the score distribution. A ten-point gain near the middle of the scale may have a different meaning than a ten-point gain near the upper limit.

What if applicants take four test measuring public relations, technical knowledge, written expression, and supervision? For which tests can you compare z scores? Will z scores equate test contribution to a composite score?

Statistics Enigma:
Normal Curves Are Not Normal

When we say our sample of scores follows a normal curve we really means it approximates a normal curve. Our sample curve has some irregularities and it is made up of discrete numbers while a normal curve is continuous. The only real normal curves are the hypothetical ones we use to compare our data to. Oops. . . chalk that one up to poetic license. Regardless, the normal curve is a handy concept to have around.

Many things in nature seem to follow normal distributions. Random error does. A lot of physical traits do. What about psychological processes? There's the Enigma: there is no way to tell.

What about all of the distribution you have seen, of everything from aptitudes to personality variables, which approximate normal curves. Don't these show that psychological traits are normally distributed?

No! It merely shows hat the measures were designed to that they would produce bell-shaped curves. We do this by picking items that are of medium difficulty. You know that you can produce a skewed test distribution by using all easy items or all hard items. Well, you get a symmetrical distribution by purposely picking items that will provide a range of scores -- not crowded by the upper or lower limit of the distribution.

When we get used to seeing data in bell curves, we begin to think that that's the way the universe is. But it's not. That's the way our measures are. When we have a trait we know to be rare in our population, we measure something else that will give us a bell curve.

For instance, it would be hard to get enough data in our sample if one of our variables was fatal accidents, so we substitute minor accidents, or accident proneness. If we tried to measure ability to spell common words, there would be too many perfect scores. So we test whether people know how to spell commonly misspelled words. When Research Analyst candidates couldn't handle the statistics problems described in the job description, we gave them easier ones. By adjusting the difficulty level of our tests, we produce bell-shaped curves. Then it looks as if the traits are normally distributed. But for most of the elements we measure, it is our arbitrary decisions that make them appear so.

Chuck Schultz may be reached at (360) 923-5340, 2941 B Firwood Loop SE, Olympia, WA 98501-4844.


© Copyright 1997 by the IPMA Assessment Council. All rights reserved.