Comments on the 1997 SIOP Symposium on the Nassau County Police Test
Frank Schmidt
University of Iowa
April 1997
This symposium may have been the most popular event on the program at the Society of Industrial and Organizational Psychology (SIOP). More than 300 people attended it. The idea for this symposium came from Linda Gottfredson, and it was her intent that I be on the panel, to ensure some semblance of balance. But panel members involved in developing the Nassau test, and who were defending it, threatened to withdraw if I was included. So the panel wound up with only one critic of the test (Linda) and many defenders--a very unbalanced panel.
In compensation for my being blackballed, Chair John Hollenbeck agreed that at the end of the symposium the first person he would call on would be me, and I would be allowed to at least say a few words. However, for some reason, he did not call on me for quite some time. But finally he did (after we had officially run out of time), and I did get to say a few words.
However, there are a number of number of additional things that I think should be addressed.
The first is Neal Schmitt's use the 1986 Personnel Psychology meta-analysis by Hannah Rothstein, Lois Northrop, and myself to argue that general mental ability (GMA) has little if any validity for police work.
First, Neal did not present the validity estimates for the criterion of learning in the Police Academy. Table 8 of our meta-analysis shows that GMA (as assessed by Verbal + Quantitative; the only GMA measure in the Table) predicts amount of job knowledge learned in the Police Academy with validity in the .71 - .75 range. Published I/O research using causal modeling indicates that the major determinant of job performance is job knowledge (research studies by Wally Borman, Malcolm Ree, and by Jack Hunter and myself; this research is summarized in Current Directions in Psychological Science, 1992, 1, 89 - 92). And GMA predicts the acquisition of job knowledge in police work with a validity of .71 or higher. These research facts indicate that GMA does predict police performance on the job.
Second, in looking at the criterion of ratings of performance, Neal cited results not for GMA but for specific aptitudes (such as memory, quantitative ability, and reasoning ability). The question at issue concerns GMA, not specific aptitudes; specific aptitudes have lower validity because each is only one of many indicators of GMA. (In fact, one of the deficiencies of the Nassau study is that it never combined its various indicators of GMA into a GMA measure, thereby masking the validity of GMA.)
Third, Neal cited biased estimates of validities; he cited observed validities, which are downwardly biased estimates. He did not present the true validity estimates, which are unbiased.
In Table 11 of our article, the relevant table, one can see that GMA (assessed by V + R + M + Sp/M; the only measure of GMA in that table) has validity for performance ratings in the .22 - .27 range. Although not large (see below), this is much larger than the erroneous figures that Neal presented (which were mostly under .10).
In the attached Q & A prepared for Law and Order magazine, I provide further evidence supporting the relevance and validity of GMA for police performance. However, it is apparent just from what I have presented here that police performance does depend on GMA, contrary to the impression given by Neal's presentation at SIOP.
Why is this important? Remember that Neal attempted to justify the virtual elimination of GMA from the Nassau exam by arguing that research indicates that GMA is barely--if at all--relevant to police work. This is clearly not the case.
However, the validity estimates for the criteria of ratings of job performance are lower than is expected for a job at this (medium) level of complexity. In his presentation, Neal stated that we had hypothesized that this could indicate that personality traits were more important in determining police job performance than is the case in most other jobs. Indeed we did. However, we also advanced another hypothesis, which we considered more credible--and Neal did not mention that hypothesis at all. We hypothesized that supervisory ratings of police job performance lack construct validity, due to lack of opportunity of supervisors to actually observe performance. (The officers are out in their patrol cars and the sergeants are in the police stations behind desks.) If so, police validity estimates based on ratings are downwardly biased. We believe this to be the case.
In my opinion, both discussants erred. Paul Sackett essentially maintained that the role of the I/O psychologist is to give the client what it wants--no questions asked. So if what a client really wants is a police selection test with minimal group differences, then this is just what you give the client. You--the I/O psychologist--have no social responsibility to ask whether police performance decrements stemming from a selection process gutted of GMA will endanger public safety. You have no social responsibility to ask, "Down the road, could people be injured or lose their lives because of resulting poor police performance?" I think this is wrong. I believe, along with Frank Landy who pointed this out from the floor during the session, that we do have social responsibilities, and that we cannot be merely "servants of power".
Kevin Murphy argued that the issue is whether I/O psychologists are under an obligation to maximize selection validity. If the answer is yes, then there can be no justification for virtually eliminating GMA from the Nassau test. If the answer is no, then there is nothing wrong with what was done in constructing that test. Kevin's argument is a red herring. The issue is not whether I/O psychologists have an obligation to maximize validity. The issue is the social consequences of stripping GMA virtually entirely out of the Nassau test.
What are those social consequences? As I noted at the symposium, we have a real-world social experiment that informs us about those consequences: The District of Columbia police force. During the 1980s, the District of Columbia took control of that police exam from the U.S. Office of Personnel Management and eliminated its GMA component--as well as essentially eliminating its background investigation. The subsequent collapse of what had been one of the best police forces in the nation has been documented by Tucker Carlson (Carlson, 1993a; 1993b; these articles appeared in The Wall Street Journal, November 3, 1993; and Policy Review, Winter, 1993, 26 - 33).
The first consequence is that the Policy Academy had to be dumbed down drastically because flunk out rates soared after the new exam was introduced. As an example of the performance decrements in performance on the job, many murder charges have had to be dropped because, due to low literacy levels, the police reports filed were unintelligible--something that had not happened before.
These and many other problems are traceable to the elimination of the GMA component of the exam. Other problems--essentially the surge in crimes committed by the police--are traceable mostly to the elimination of the background investigation. (It is not true, as Jim Outtz stated at the symposium, that all the subsequent problems were due to the elimination of the background investigation.)
I urge everyone to read these two chilling articles. The bottom line conclusion is this: Because of the gutting of a police selection test, some people were injured and some people lost their lives. A lot of other bad things happened, as detailed by Carlson, but these are the worst things. And they did happen. Gutting the D.C. police selection test was serious act of social irresponsibility. There is no reason to expect different results in Nassau County.
Kevin Murphy argued that use of GMA tests "will guarantee a segregated, all white police force". However, the District of Columbia police force was mostly Black before the police selection test was gutted. That is, it was mostly Black back when its performance was outstanding.
Kevin Murphy and some others on the panel warned against "demonizing the Justice Department" (DOJ). They said DOJ was "just doing its job". That is what DOJ would like you to believe, but it is not true. DOJ would like you to believe that doing its job-- enforcing the law--requires it to pressure police agencies throughout the U.S. to gut the validity of their selection exams. In actuality, there is no such requirement in any law; we have a Justice Department that has moved outside the law.
This is essentially what Federal judge Mariana Pfaelzer said in her opinion last September upholding the police hiring test used by the city of Torrance, California, against DOJ charges that it was discriminatory [U.S.A. v. City of Torrance, Central District of California, Case no. CV 93-4142 MRP (RMCx)].
This case simultaneously illustrates DOJ's bully boy modus operandi and threatens that modus operandi. Over the last five years, DOJ has followed the same procedure in city after city. DOJ goes into a city and "reviews" their selection methods for police officers and firefighters. They then say something like this to the city: "Our review finds your tests to be discriminatory and illegal, because pass rates (or hiring rates) are lower for Blacks or Hispanics. We can take you to court, but tell you what--we won't, if you'll just sign this consent decree and meet these hiring numbers each year. It's a lot cheaper than a court case." Few cities have had the financial resources to resist this threat.
Four other cities in California alone capitulated to this DOJ shakedown racket. But the small city of Torrance was different: it knew its hiring methods were fair and nondiscriminatory and it refused to cave in. So DOJ brought all its considerable Federal power and resources to bear against the city--and lost in court. This was not a close loss, either. DOJ lost ignominiously, with the judge castigating DOJ quite severely for its tactics and methods. DOJ has now appealed this case; if it loses on appeal, which many think likely, this case will become law in an entire circuit, posing a real threat to the bully boy blackmail tactics of DOJ.
The idea expressed at this symposium that DOJ is "just doing its job" and "just enforcing the law" is false. DOJ has gone outside the law and has abused its power. These things are indeed bad but they are not the worst thing. The worst thing is that, in jurisdictions across the country, DOJ has endangered public safety--and continues to do so.
Questions and Answers on the NCPD Exam
for Law and Order Magazine
Frank Schmidt
Jan. 24, 1997
1. QUESTION. What do you think is the fundamental cause of the controversy over the NCPD exam?
ANSWER. The basic cause is the conflict between the goal of equal hiring rates for minority applicants and other applicants, on the one hand, and the need to hire people who will best learn the job requirements and will perform best on the job, on the other. We would all like to see minorities be hired and advance at the same rate as others. But we know from years of research that if you hire on the job related qualifications that indicate high job performance, the percent of minority applicants hired will generally be lower than for others. On the other hand, if you lower hiring requirements to reduce this "adverse impact" on minority applicants, you will have lower job performance. So this is a dilemma.
This dilemma does is not due to any bias in the tests; it is due to the fact that minority individuals have not acquired the required skills and abilities to the same degree as others. All employers face this dilemma, and so do personnel psychologists who develop tests and other hiring procedures.
The NCPD exam is different from most other cases in two ways. First, the Civil Rights Division of DOJ was involved and exerted strong pressure to reduce minority-majority score differences, even at the expense of reduced performance on the job and in training. Second, poor procedures for hiring police officers can endanger public safety in ways not true for many other jobs. So it is particularly important in police hiring to hire those people with the highest indicated future levels of job performance. In the District of Columbia, where this was not done, and hiring methods have been very bad, the result has been disastrous, as has been well documented.
Some of the most respected and accomplished people in industrial psychology were involved in developing the NCPD exam. I have a great deal of respect for these individuals. But they were under a lot of pressure from DOJ. The history of past litigation over police tests in Nassau County intensified this pressure from DOJ. I understand the position they were in. I have had that same kind of pressure exerted on me. But I think their final report went too far in reducing the mental ability component in the exam.
2. QUESTION. Let's follow up on that. In a letter to the WSJ, you stated that NCPD exam "comes close to totally disregarding the critical mental skills needed in police work." But the NCPD exam does contain a mental ability test: a test of reading skills. What is wrong with finding out what the minimum ability needed to do a job is and then setting your requirements at this level of ability?
ANSWER. What do you mean by "minimum ability to do the job"? Job performance is not just either OK or not OK. It is not dichotomous. It varies from horrible (and dangerous to public safety) all the way up to outstanding. The higher the ability levels of the people hired, the higher their performance will be. We know this from 85 years of research. If you set ability requirements at the minimum, then you are going to get minimum job performance, too. This is not what we should be doing, especially on a job where public safety is at stake.
Furthermore, there is not even a real minimum ability requirement on the NCPD exam. If you score at or above the bottom 1% of the current officers on the reading skills test, you get points on the exam. (And you get the same number of points whether you barely score above the bottom 1% or you get the highest possible score.) If you score below the bottom 1%, you don't get points. But you are not necessarily eliminated: You can still be hired if your scores on the personality parts of the test are high enough. So you do not even have to score as high as the bottom 1% on this ability test to be hired.
Also, a measure of reading comprehension, used alone, is not an optimal measure of ability, even if you select people top down on those scores. To get a good overall measure of mental ability, other measures of mental ability should also be included--for example, quantitative and reasoning ability measures--and all these mental ability measures should be combined into the final measure of general intelligence. However, in this case all other mental ability measures were eliminated from the test. This should not have been done.
People with higher mental ability learn the job knowledge presented in the Police Academy faster, and they learn and retain more of it. We know from research that job knowledge is a major determinant of performance on the job. The simple fact is that you cannot do the job if you don't know what you are supposed to be doing. This is a major reason why it is important to hire the most intelligent police officers possible.
3. QUESTION. Well, as you say, the cutoff for credit was set at the bottom 1% of current police officers. These officers were on the job and were performing satisfactorily, weren't they?
ANSWER. Well, I noted above, you can still be hired even if you score below the bottom 1% of current officers.
Also, the bottom 1% of current officers are probably not performing satisfactorily. According to the report on the NCPD exam, 14% of current officers are below the satisfactory level of job performance. This unsatisfactory group likely includes those in the bottom 1% on the reading comprehension test of mental ability. The last thing you want to do is hire more officers who will likely be in this low performance category. And this would be true even if their low performance were "minimally satisfactory".
4. QUESTION. What about the requirement that in order to be hired, you have to have 32 college credits? Doesn't that requirement ensure that anyone hired will have a pretty good level of mental ability?
ANSWER. Well, there is some question about whether the 32 college credits are being required. Supposedly, DOJ will not allow use of this requirement if it has adverse impact against minority applicants--which it likely does. So Nassau County may not be using this requirement.
But the answer to your question is no: even if this requirement were used, it would not ensure the needed minimum levels of mental ability. And it certainly would not be a good way to identify high levels of mental ability.
It is true that the average mental ability level of people who have completed 32 hours of college credit is probably somewhat higher than for those who have no college credits. But given what we know about admissions standards and grading standards in most community colleges and some other colleges, this difference in averages is probably small.
And another fact is critical here: there is a lot of variability in mental ability in the group with 32 credits. Those on the low end in this group are likely to be quite low in mental ability. The 32 credit requirement will not detect these individuals and screen them out. So this requirement, even if it were used, could not be a substitute for a strong mental ability component in the exam.
But even if Nassau County did require the 32 college credits, and if this requirement did work as a measure of mental ability, there would still be a big problem. DOJ has "suggested" to other police departments around the country that they should use the NCPD exam. Many police departments do not have a requirement for college credits. So they will not have even this inadequate safeguard against disaster.
5. QUESTION. In response to some of the criticisms of the NCPD exam, the TDAC group states that past research shows low validity for mental ability tests for the job of police officer. If that is true, then why get excited about the "virtual elimination" of mental ability from the NCPD exam?
ANSWER. We know from research that mental ability tests have a very high level of validity in police work for predicting the learning of job knowledge. That is, research shows that people with higher levels of mental ability learn and retain more job knowledge in the Police Academy. This relationship is very strong, as you would expect from that fact that the material taught in police academies is complicated and complex. (Validities are in the .60 to .70 range.)
Across a wide range of different jobs studied, research shows that people with more job knowledge perform better on the job. Job knowledge is probably the most important direct determinant of job performance. You cannot perform the job if you don't know what you are supposed to be doing.
The TDAC group was not talking about the ability of mental ability to predict the learning of job knowledge. They ignored this important fact and discussed only studies in which police job performance was measured by ratings (usually ratings by supervisors). And even there, they gave estimates of validity much lower than the actual values shown by research.
We know that police performance on the job requires considerable mental ability. Every job analysis of police work, including the NCPD exam job analysis, has shown that police work requires considerable judgment, decision making, and other complex information processing. When police job performance is measured by ratings, mental ability measures do predict these ratings, although not as well as for non-police jobs that require similar levels of judgment, decision making, and information processing. (For police work, validities are approximately .25 vs. about .51 for other jobs comparable in complexity.)
So mental ability does predict performance on the job for police officers. However, these results for police work are anomalously low compared to results for other comparable jobs. So they probably underestimate the predictive power of mental ability for police officer performance on the job.
The reason these estimates are low is probably that ratings of police job performance are not as accurate as for most other jobs. This would be expected because police supervisors have little opportunity to observe the actual everyday job performance of their officers. The sergeants are in the station, and the officers are out in patrol cars. Since the supervisors don't observe actual performance, they can't rate it accurately.
What we know about occupations in general from 85 years of research and what we know from job analysis of police work is not consistent with a conclusion that mental ability could be less important in police job performance than for other jobs with similar mental demands. We know that mental ability strongly facilitates the learning of job knowledge in police work, as in other occupations. It is unlikely that job knowledge has less impact on job performance in police work than in other jobs of comparable mental complexity.
And if job knowledge does strongly affect police job performance, as is the case in other occupations, then mental ability must have a strong effect job performance--because mental ability is the major determinant of amount of job knowledge learned. So the conclusion that mental ability has less validity for police performance on the job than for other jobs of comparable complexity is not likely to be correct.
However, I do not want to overemphasize this apparent underestimation of validity. The research estimates of approximately .25 for the validity of mental ability for predicting police job performance that I mentioned above are still substantial, even if they are underestimates. Taken at face value, even these estimates indicate that mental ability is important in police performance on the job.
The District of Columbia police are a real-world test of the importance of mental ability for police job performance. When mental ability requirements for hiring were eliminated in Washington, D.C., performance in the Police Academy literally crashed. But performance on the job also crashed. This indicates that mental ability has an important effect on police performance not just in the Police Academy, but also later on the job.
6. QUESTION. A considerable part of the NCPD exam consists of personality and related tests ("noncognitive" measures). What is wrong with using personality measures in police hiring?
ANSWER. Nothing. Valid personality tests increase the validity of an overall selection exam. They also have the benefit of reducing adverse impact somewhat, because minority and other applicants usually score about the same. Considerable research shows that personality tests can predict both training performance and job performance.
However, personality tests are not a substitute for mental ability tests. The people with the best personality test scores are not necessarily those with high intelligence. The problem with the NCPD exam is that it virtually eliminates mental ability requirements, not that it uses personality tests.
7. QUESTION. What do you think the national ramifications are for the NCPD exam? What if it is widely used across the U.S. for police hiring?
ANSWER. This test is going to be a disaster wherever it is used. We know from past experience and from research that the rate of failure in the Police Academy will shoot up wherever this exam is used to hire police officers. Unless the content of the Academy is "dumbed down", many candidates will not be able to graduate. If Academy demands are dumbed down, then police performance on the street will deteriorate markedly (as it did in Washington, D.C.), and public safety will be endangered (as it has been in Washington, D.C.).
In the end, the consequences will be so intolerable that public outrage will cause the exam to be discontinued. But that will take quite some time. And in the meantime much harm will be done.
8. QUESTION. As I understand it, validity tells you how well a test works. In the NCPD exam study, there were statistical adjustments made to the study findings on the validity of the NCPD exam. What is your opinion of those adjustments?
ANSWER. Such "adjustments" are corrections intended to eliminate the biases that occur in validity studies. It is appropriate, indeed required, to make corrections of this sort. If you don't, you have biased estimates of exam validity.
So the problem is not that such corrections were made. The problem is that some of these needed corrections are pretty complex, and it is easy to make an error in applying them. In this case technical errors were made in applying at least one of the corrections, resulting in an overestimate of exam validity.
The TDAC group has now acknowledged that errors were made and have computed new validity estimates. However, there are still technical errors in their new calculations, and their new estimates of how well the NCPD exam works are still biased upwards.
But the NCPD exam is not completely invalid; it does have some validity. The validity us just fairly low. By my calculations, and using the criterion of supervisory ratings relied on in the NCPD study, the best estimate of the operational exam validity is .14 (on a scale from zero to 1.00).
Normally, something of this sort might just be considered a technical point. But in this case, it is more important, because the validity of previous Nassau police exams was compared with the validity of the new NCPD exam. The conclusion was that this new exam was more valid than the police exams used in the past. This conclusion may not be valid, because of the upward bias in the validity estimate of the new NCPD exam.
But what about the fact I mentioned earlier: that police job performance ratings are not as accurate as ratings in most other jobs? That fact probably causes a downward bias in the validity estimate for the new exam and for all the older exams, too. That is, validity estimates for all the exams are biased downward by about the same amount. So you can still compare them to see which one does the best job--or you could if it were not for the upward bias in the validity estimate for the new NCPD test. So you have to eliminate this upward bias if you want an accurate conclusion about which of these exams is most valid for police hiring. So far this upward bias has not been eliminated.
