Skip to main content.

Statistics Tips and Enigmas


Statistics Tip: What, You Don't Have Random Samples?

The assumption of random sampling is fundamental to significance testing. Significance tests are based o precisely constructed hypothetical situations. They compare an event with the expected distribution of such events given the hypothetical situation. The expected distribution represents the probabilistic distribution of events selected at random from a specified population.

Samples may come from clearly different distributions, as when you wish to compare some characteristic of men and women. In this case, you can make the assumption that population parameters for the distributions are identical. Still, the samples compared must be drawn to be representative of their respective distributions. Unless the samples duplicate their populations, the significance test doesn't apply. In order to test a hypothesis you must be able to conclude that the difference observed is related to the hypothesized variable and no other.

Suppose a college professor conducting an experiment in class asks the students, "Who wants to be in the experimental group and who wants to be in the control group?" Would the results be influenced by the experimental variable or the kind of people selecting each option? Clearly, when individuals select themselves to a sample, the significance testing procedure is flawed and conclusions about the variable cannot be drawn.

Suppose you wanted to know whether men or women were more conscientious and you have available applicants for a promotional position. The recruiting announcement said in effect: "We have a job here that pays 20 percent more than the average salary of women and 20 percent less than the average salary of men." Is there any hope that the women candidates will be selected from among women employees in the same way that men candidates are selected from among men employees?

Conclusions would not generalize to women employees from your organization and men employees from your organization. Any conclusions would be specific to the applicants only. You certainly couldn't make generalizations to women and men in general.

Applicant groups are always self-selected. Additionally, the populations they randomly select themselves from can hardly be compared to any more general population. Therefore, conclusions do not generalize beyond the specific individuals. When you observe statistics from applicant groups, these cannot be generalized to other groups. Not to women. Not to Hispanics. Not to Blacks. Not to disabled people. Not to anyone.

You need to draw conclusions about important variables, but don't delude yourself that you can justify your conclusions with significance tests based on samples that are not comparable.


Statistics Enigma: I Partialed It Out, But It's Still Here

We know that partial correlations allow us to see how much two variables would be correlated if it wasn't for the contamination of some third variable. Let's see how it worked when a school PTA ran a poetry competition. When the judges turned in their results, the PTA committee were apprehensive. They questioned whether the judges had truly rated creativity or if they rated the poets' vocabularies. They asked Phyllis al Khwarizmi, the math teacher, to investigate.

Phyllis found a correlation between Poetic Creativity and Vocabulary Test Score to be .50. This suggested the judges were strongly influenced by vocabulary. However, it seemed that students with high Creativity Ratings and Vocabulary Test Scores were older. She decided to see whether age was responsible for the correlation between poetry ratings and vocabulary. Age and creativity also correlated .50, while age and vocabulary correlated .80. She suspected that is she removed the effect of age the correlation between the muse and academe would disappear.

She computed a partial correlation coefficient of vocabulary and creativity, holding age constant. Her computation follows:

[Equation 1]

[Equation 2]

Phyllis concluded that the judges were somewhat influenced by vocabulary. She discussed the results with the PTA committee. They decided to redo the ratings using different judges and clarifying the rating criteria.

The following month, Phyllis gave her math class a standardized achievement test. She was concerned that the class hadn't taken it seriously, sh she did some public relations for the testing process and then gave them another form of the same achievement test. She saw that the means and standard deviations of the two forms were the same and that each correlated .50 with class grades. The correlation between the two forms was .80.

She then noticed that these three correlations were exactly the same as the ones she had encountered in the PTA's poetry competition. She decided to compute a partial correlation - the correlation between course grades and match achievement with math achievement held constant. It seemed a silly thing to do; she knew the answer had to be zero. But the computation yielded a partial correlation of .37.

If you partial out one equivalent form from the correlation of the other equivalent form and a third variable, how could you get .37? She told her former statistics professor what had happened, and he said, "Oh, sure. That's because of unreliability. If you had corrected for unreliability the partial correlation would have been zero."

Phyllis then realized that if she had corrected for unreliability the correlation between creativity and vocabulary with age partialed out the result would likely have been zero, too. Perhaps there had been a problem with comparing poetry of children from different ages with the same criteria. But the problem probably had nothing to do with confounding of vocabulary and creativity. The judges had been fired for the wrong reason.

Phyllis vowed, "Next time I compute a statistic, I'm going to find out what it means before I advise people on how to improve their measurement."


© Copyright 1996 by the IPMA Assessment Council. All rights reserved.