Statistics: Tips and Enigmas
by Chuck Schultz
Statistics Tip: What’s the significance?
Continually, investigators report significance levels negligently. Continually journal editors publish these careless statements. When should multiple significance levels be stated? “Never!” beats “often,” as an answer.
Significance tests arbitrate chance differences among observations. A typical significance test finds where a difference falls in the normal distribution curve representing chance differences. The point a difference falls on the curve indicates the probability (P) that a difference this large would occur by chance. If the difference would rarely occur by chance, the investigator can justify concluding that the variable in question had an effect. The investigator names a significance level to show how big a risk he or she is willing to take in proclaiming a variable effective.
Research protocol requires an investigator to tell before seeing the results what probability level will be called rare. Conventionally, the investigator calls either one occurrence in twenty (P=.05) or once occurrence in one hundred (P=.01) rare. Critics look askance at an investigator who chooses to call on occurrence in twenty-five rare, because they question the unconventional choice (P=.04).
More pertinently, critics look askance at investigators who are irresolute about choosing a significance level. However, many investigators refuse to choose, and report both the .05 and .01 levels. What does it show when some observations reach the .01 level while others reach only the .05 level? It shows only that the investigator will not state an opinion on what is rare. He or she refuses to observe the research protocol, and, perhaps, don't understand the reason for it.
The investigator may want you to believe that differences at the .01 level represent a stronger relationship than those at the .05 level. For example, one validity coefficient, .35*, is significant at the .05 level, while a second validity coefficient, .36**, is significant at the .01 level. However, observations that lead to different significance levels are rarely significantly different from one another.
Investigators frequently compound the problem by reporting different significance levels for different correlations in a matrix. Widely available significance tables for correlations are typically based on tests for independent estimates. These lack appropriateness for most correlation matrices for two reasons. First, various correlations based on a single set of entities (or correlations from data collected on the same occasion) lack independence; errors are correlated. Second multiple observations provide more chances for observations to fall at the extremes. (That is why, analysis of variance procedures require a significant overall F before computing probabilities of individual effect.)
Statistics Enigma: Why are there no people in the top quartile?
The French government does not own the language. Words mean what the speaker intends them to mean, that’s all. Problems occur when you intend a word to mean one thing and I think you mean something else. That’s why people concern themselves with standard meanings for words.
But dictionaries nowadays don't claim to authenticate the meaning of a word, but, rather, to report how people use it. There is no “using a word wrong,” there is just using it differently. So, if I use a word to mean one thing, and you use it to mean the exact opposite, neither of us is using it correctly or incorrectly, we've both just using it. However, a third person upon hearing the word may not know whether to infer your meaning or mine.
By generating a popular non-standard use of a word, a sample of people can spoil its use for the original purpose. Upon hearing a non-standard use, a language purist may cringe or be amused by the “ignorance” of the user. Upon hearing a non-standard use gain popularity, a language purist will despair.
Ironically, language purists wouldn’t know if they were using a word impurely, because each of us knows only what sounds right to us. Additionally, many standard uses formerly meant something very different. Therefore, we language purists take a chance at looking ridiculous when we make pronouncements about proper use of words - casting the first stone, as it were.
Confused usage results in some words, such as biweekly, being listed in the dictionary with opposite meanings. My older dictionaries give only one definition of stakeholder, while my newer one also gives the "corrupted," opposite meaning, which has recently gained popularity.
I used to be confident that criteria meant more than one criterion; now I’m not sure. My statistics professor knew the difference between data and information, while my classmates didn't distinguish between stats and stat.
When I hear someone talk about the people in the top quartile, I want to say, “There are no people in the top quartile. There are no people in the top quintile or the top decile either.” The quartile, as any other centile, is a point on the curve, not an area. A point has no dimension. It is not part of the distribution, just a point on it. Therefore, there are no people in the top quartile.
Well, then, what do you do when you want to talk about those high-scoring people? Isn't it easier to say in the top quartile than to say in the top twenty-five percent? Yes, and it’s also easy to say in the top quarter? And it’s less confusing. What are you saving the word “quarter” for?
Shall we maintain “pure” meanings of statistical terms or shall we welcome flexibility? Is the language degenerating or developing? Whatever you say...
Chuck Schultz may be reached at (306) 923-5340, 2941B Firwood Loop, Olympia, WA 98501.
© Copyright 1996 by the IPMA Assessment Council. All rights reserved.
