Skip to main content.

Technical Affairs

Mike Aamodt, Associate Editor


Understanding Correlation

Correlation is a statistical procedure that allows a researcher to determine the relationship between two variables: for example, the relationships found between an employment test and future employee performance, or job satisfaction and job attendance, or performance ratings made by workers and supervisors. It is important to understand that correlational analysis does not necessarily say anything about causality.

Why does a correlation coefficient not indicate a cause and effect relationship? Because a third variable, an intervening variable, often accounts for the relationship between two variables. Take the example often used by psychologist David Schroeder. Suppose there is a correlation of +.80 between the number of ice cream cones sold in New York during August and the number of babies that die during August in India. Does eating ice cream kill babies in another nation? No, that would not make sense. Instead, we look for that third variable that would explain our high correlation. In this case, the answer is clearly the summer heat.

Another interesting example was provided by psychologist Wayman Mullins in a conference presentation about the incorrect interpretation of correlation coefficients. Mullins pointed out that data show a strong negative correlation between the number of cows per square mile and the crime rate. With his tongue firmly planted in his cheek, Mullins suggested that New York City could rid itself of crime by importing millions of head of cattle. Of course, the real interpretation for the negative correlation is that crime is greater in urban areas than in rural areas.

As demonstrated above, a good researcher should always be cautious about variables that seem related. A few years ago, People magazine reported on a minister who conducted a "study" of 500 pregnant teenaged girls and found that rock music was being played when 450 of them became pregnant. The minister concluded that because the two are related (that is, they occurred at the same time), rock music must cause pregnancy. His solution? Outlaw rock music and teenage pregnancy would disappear. In my own "study" however, I found that in all 500 cases of teenage pregnancy, a pillow also was present. To use the same logic as that used by the minister, the real solution would be to outlaw pillows, not rock music. Although both "solutions" are certainly strange, the point should be clear: Just because two events occur at the same time or seem to be related does not mean that one event or variable causes another.

Interpreting a Correlation Coefficient

Magnitude and Direction

The result of correlational analysis is a number called a correlation coefficient. The values of this coefficient range from 0 to +1 and from 0 to -1. The further the coefficient is from zero, the greater the relationship between two variables. That is, a correlation of .40 shows a stronger relationship between two variables than a correlation of .20. Likewise, a correlation of -.39 shows a stronger relationship than a correlation of +.30. The + and - signs indicate the direction of the correlation. A positive (+) correlation means that as the values of one variable increase, so do the values of a second variable. For example, we might find a positive correlation between intelligence and scores on a classroom exam. This would mean that the more intelligent the student, the higher her score on the exam.

A negative (-) correlation means that as the values of one variable increase, the values of a second variable decrease. For example, we would probably find a negative correlation between the number of beers that you drink the night before a test and your score on that test. In human resources, we find negative correlations between job satisfaction and absenteeism, age and reaction time, and nervousness and interview success.

Factors Limiting The Magnitude of a Correlation

Suppose that we know the "real" relationship between cognitive ability and performance in the police academy is .80, yet in our own study, we obtained a correlation of .40. What would explain this discrepancy? Probably three factors: test unreliability, criterion unreliability, and range restriction. The size of a correlation coefficient is limited by the reliability of the two variables being correlated. So, if our two measures, in this case scores on a cognitive ability test and grades in the academy, have low reliability, our validity coefficient will be lower than expected. Thus, the moral of this story is to use the most reliable measures possible (no test or measurement is perfectly reliable).

A correlation coefficient is also limited by the range of test scores and performance measures that are included in the study. The wider the range of scores, the higher the validity coefficient. Unfortunately, in the typical validity study in which we correlate a test with some measure of performance, we usually encounter something called range restriction. That is, we don't have a full range of test scores or performance ratings. For example, in a given employment situation, few employees are at the extremes of a performance scale. Employees who would be at the bottom were either never hired or have since been terminated. Employees at the upper end of the performance scale either got promoted or went to an organization that paid more money.

Another problem that can lower the size of a correlation coefficient is curvilinearity. One of the assumptions behind correlation is that the two variables being correlated are linearly related B a high score on a test is related to a high performance ratings and that the higher the test score, the higher the performance rating. However, many things in life are not linearly related. For example, bright people perform better than less bright people in the police academy. But, is there a point at which increases in intelligence don't help? That is, we could probably agree that a person with an IQ of 110 would do better in the academy than a person with an IQ of 90. However, would a person with an IQ of 150 outperform a person with an IQ of 130? Probably not, because the material learned in the academy is only so difficult and at some point, being super smart may not provide any advantage over being smart.

Likewise, it may be that too much of a variable could actually result in decreased performance - something called the "inverted U." The relationship between arousal and performance provides the perfect example. A person who has low levels of arousal is probably not motivated enough to do well on a task. A person with very high levels of arousal will become nervous and perform poorly. However, a person with a moderate level of arousal has enough energy to be motivated but not so much that performance will decrease (too nervous and tight). So, even though there is a relationship between arousal and performance, the relationship is not linear. Thus, a simple correlation between arousal levels and performance would probably not result in a significant correlation and we would incorrectly conclude that there is no correlation between the two variables. Fortunately, there are some statistical adjustments that we can make to test for this possibility (converting our measures to z-scores and then squaring them). Just as fortunately, such adjustments are beyond the scope of this column and probably your interest as well!

Statistical Significance

To determine if we are even allowed to interpret a correlation coefficient, we compute something called a significance level (see the previous issue of the ACN for a discussion of significance levels). The significance levels tells us the probability that our correlation coefficient occurred by chance alone. That is, if we obtain a correlation of .30 between a test score and supervisor ratings of on-the-job performance, is there really a relationship between the two variables or is our correlation a chance finding?

The significance level for a correlation coefficient is a function of two factors, the size of the correlation coefficient and the sample size used in the study. The greater the sample size, the smaller the correlation needed for statistical significance. For example, as shown in the table below, a correlation of .20 would be significant if we had 100 employees in our study but not if we had only 50 employees.

Sample Size Smallest Significant
Correlation (p<.05)
10 .63
20 .44
30 .36
40 .31
50 .27
64 .25
70 .23
80 .22
90 .21
100 .19

If a correlation coefficient is not statistically significant, we cannot even try to interpret it as being high/low or useful/not useful. We essentially pretend that it doesn't exist. If, however, the correlation is statistically significant, we then must address the issue of practical significance.

Practical Significance: Is Our Correlation Any Good?

In determining the validity of a test, we often conduct what is called a criterion validity study in which we correlate the scores from a selection test (e.g., civil service exam, structured interview, assessment center) with a measure of work related behavior (e.g., supervisor ratings, commendations, disciplinary problems, attendance record). The result of that correlation is called a validity coefficient and is identical to the correlation coefficient discussed earlier. In interpreting a validity coefficient - that is determining if it is "any good" - there are two main approaches that are used: comparison to norms and utility analysis.

Comparison to Norms

Though we would like to have a perfect correlation between test scores and on-the-job performance, such will never be the case. As a rule of thumb in personnel selection, validity coefficients above .20 will probably be useful, those above .30 are high, and those above .40 are outstanding. Validity coefficients greater than .50 probably indicate one of two things: either the correlation coefficient is suspect (e.g., calculation errors, chance due to small sample size, cheating) or the personnel analyst deserves the Nobel Prize for science!

If we are using a particular type of test, and want to compare it to similar tests, the table below will be helpful. For example, if we correlated our assessment center scores with supervisor ratings of performance and obtained a correlation of .15, we can see from the table that our validity of .15 is well below the typical validity of .25 for assessment centers. Note to meta-analysis fans - the correlations in the table below are uncorrected, see Aamodt (1999, page 248) or Schmidt and Hunter (1998) for tables showing corrected or "true" validities.

Selection
Technique
Meta-analysis
Average validity
coefficient
Biodata Beall (1991) .36
Structured interview Huffcutt & Arthur (1994) .34
Assessment centers Gaugler et al. (1987) .25
Experience Quinones et al. (1995) .22
Integrity tests Ones et al. (1993) .21
Grades Roth et al. (1996) .16
Personality Tett et al. (1994) .12
Unstructured interview Huffcutt & Arthur (1994) .11

Utility Analysis

Another way to determine if a validity coefficient is "any good" would be to translate the correlation into terms that most people can understand. Though there are several different methods used to establish the utility of a test (e.g., Taylor-Russell tables, expectancy charts), we will concentrate on the use of the Brogden-Cronbach-Gleser Utility Formula. This formula computes the amount of money that an organization would save if it used a test to select employees. To use this formula, five items of information must be known.

1. Number of employees hired per year (n). This number is easy to determine: It is simply the number of employees who are hired for a given position in a year.

2. Average tenure (t). This is the average amount of time that employees in the position tend to stay with the company. The number is computed by using information from company records to identify the time that each employee in that position stayed with the company. The number of years of tenure for each employee is then summed and divided by the total number of employees.

3. Test validity (r). This figure is the criterion validity coefficient that was obtained through either a validity study, the technical manual that accompanies a commercially available test, or validity generalization.

4. Standard deviation of performance in dollars (SD$). For many years, this number was difficult to compute. Research has shown, however, that for jobs in which performance is normally distributed, a good estimate of the difference in performance between an average and a good worker (one standard deviation away in performance) is 40% of the employee's annual salary. To obtain this, the total salaries of current employees in the position in question should be averaged.

5. Mean standardized predictor score of selected applicants (m). This number is obtained in one of two ways. The first method is to obtain the average score on the selection test for both the applicants who are hired and the applicants who are not hired. The average test score of the nonhired applicants is subtracted from the average test score of the hired applicants. This difference is divided by the standard deviation of all the test scores. For example, we administer a test of cognitive ability to a group of 50 applicants and hire the 5 with the highest scores. The average score of the 5 hired applicants was 35.2, the average test score of the other 45 applicants was 28.2, and the standard deviation of all test scores was 8.5. The desired figure would be:

(35.2 - 28.2) 8.5 = 7.0 8.5 = .647

The second way to find m is to compute the proportion of applicants who are hired and then use a conversion table to convert the proportion into a standard score. This second method is used when an organization plans to use a test, knows the probable selection ratio based on previous hirings, but does not know the average test scores because the organization has never used the test. Using the above example, the proportion of applicants hired would be:

openingsapplicants = 550 = .10

From the table below, we see that the standard score associated with a selection ratio of .10 is 1.76.

Conversion Table
Selection Ration Standard Score (m)
.05 2.08
.10 1.76
.20 1.40
.30 1.17
.40 0.97
.50 0.80
.60 0.64
.70 0.50
.80 0.35
.90 0.20
1.00 0.00

To determine the savings to the company, we use the following formula:

savings = (n)(t)(r)(SD$)(m) - cost of testing

Cost of testing = # of applicants × cost per applicant

As an example, suppose we will hire 10 auditors per year, the average person in this position stays two years, the validity coefficient is .40, and average annual salary for the position is $30,000, and we have 50 applicants for 10 openings. Thus,

n = 10

t = 2

r = .40

sd$ = $30,000 x .40 = 12,000

m = 1050 = .20 = 1.40 (.20 is converted to 1.40 by using the conversion table)

cost of testing = (50 applicants x $10)

Using the utility formula, we would have

(10)(2)(.40)(12,000)(1.40) - (50)(10) = $133,900

This means that after accounting for the cost of testing, using this particular test instead of selecting employees by chance will save a company $133,900 over the two years that auditors typically stay with the organization. Because a company seldom selects employees by chance, the same formula should be used with the validity of the test (interview, psychological test, references, and so on) that the company currently uses. The result of this computation should then be subtracted from the first.


HR Humor

Here is a piece of HR humor from the Internet that was sent in by an ACN reader.

You know when you've been brainwashed by Total Quality Management (TQM) when you:

  • decide to organize your family into a team-based organization

  • refer to dating as test marketing

  • insist the kids call you a facilitator instead of a father

  • write executive summaries on your love letters

  • celebrate your wedding anniversary by conducting a performance review

  • believe you have no problems in life, just issues and improvement opportunities

  • explain to your bank manager you prefer to think of yourself as highly leveraged as opposed to in debt

  • refuse to go on a vacation without an action plan

  • talk to the waiter about process flow when your meal arrives late

  • think you cat needs intensive motivational training

  • want your spouse to draft a statement of commitment

  • give constructive feedback to your dog

  • start to feel sorry for Dilbert's boss

 


Mike Aamodt, a Professor of Psychology at Radford University serves as our Associate Editor for the Technical Affairs column and as our unofficial humor editor. If you have a technical question you want answered/discussed, wish to comment on this month's article, or want to share a humor item please contact Mike. He may be reached by email (maamodt@runet.edu), phone (540) 831-5513 or fax (540) 831-6113.


© Copyright 1999 by the IPMA Assessment Council. All rights reserved.