Skip to main content.

Assessment Centers:
Is It Better to Score by Dimension or Exercise?

by Dr. Warren Bobrow


A few months ago (January, 25th to be exact), I posted the following message on the ECN:

I was having a discussion with a client, and we were discussing the merits of scoring assessment centers by dimension (in the classic AT&T sense) vs. scoring them by exercise (focusing on the context rather than dimensions). I’d like to hear some thoughts and see any data that supports your view. We have some AC criterion validation data coming in, and I will share it after it has been analyzed.

I received some responses to the note (all of them very thoughtful), most along the line of, “Boy, what a great question, let me know when you have the answer.” Well, I am not sure if I have the answer or not, but I do have some data that I think you will find interesting.

Study Design:

The data comes from two validation studies. In validation study #1, an assessment center (AC) was developed using a content validation strategy for customer service supervisors. The AC consisted of an in-basket, coaching session and task force meeting. The following 10 dimensions were measured:

One-hundred and eighty-six (186) people went through the AC for selection. Some participants were internal bidders and others were external applicants. After the ACs were run, existing performance appraisal data was gathered for 86 of the participants.

In validation study #2, the same AC was used for development of current supervisors in a cable TV company. The exercises were slightly modified for this client“s needs, and the scoring guides were (hopefully) improved. There were 65 participants in this AC. Existing performance appraisal data was gathered for 47 of them.

In both ACs, two raters were assigned to each exercise. Raters were asked to compare ratings on each dimension within each exercise. Differences were resolved by consensus. There was not a consensus process for dimensions across exercises. I believe there is enough in common between exercises, scoring guides and consensus process that the data from the two ACs can be combined.

What Are the Factors of AC Performance?

This is the age-old question for ACs. Do they measure abilities across different exercises, as argued by Neidig, Martin & Yates (1979)? Or, do ACs really measure contextual performance by exercise (Sackett & Dreher, 1982)? In any event, which assumption leads to accurate prediction of job performance?

The underlying constructs of this data set were identified in two methods. In the first method, all 27 of the exercise by dimension were factor analyzed (principal axis extraction, oblique rotation, for those of you statistically inclined). That lead to the following factor structure:

Exercise Factor 1 Exercise Factor 2 Exercise Factor 3 Exercise Factor 4 Exercise Factor 5
In-Basket-Decision
Making
Coaching-Leadership Team-Oral Comm. Team-Business Focus In-Basket-
Interpersonal
In-Basket-Business
Focus
Coaching-Decision
Making
Team-Team Focus Team-Analytical Ability In-Basket-Written
Comm.
In-Basket-Analytical Ability Coaching-Analytical Ability Team-Leadership
In-Basket-Responsibility Coaching-Oral Comm. Team-Interpersonal
In-Basket-Leadership Coaching-Responsibility Team-Decision Making
In-Basket-Team Focus Coaching-Stress Tolerance Team-Written Comm.
Coaching-Interpersonal

This would seem to confirm Sackett & Dreher's (1982) findings because nearly all of the dimensions gather around exercise factors.

The next step was to look at the factor structure of the dimension scores. To obtain these scores, dimensions were averaged across exercises. A factor analysis analyzed (principal axis extraction, oblique rotation) was performed on the 10 dimensions yielding the following results:

Dimension Factor 1 Dimension Factor 2 Dimension Factor 3
Team Focus Stress Tolerance Business Focus
Interpersonal Oral Comm Analytical Ability
Written Comm. Responsibility
Leadership
Decision Making

This result partially supports Neidig, et al in that it does show that the different dimensions are not measuring a single factor. It also supports Gaughler & Thorton’s (1989) contention that assessors can only distinguish three dimensions in ACs.

Which Scoring System Yields Higher Validity?

An important issue that practitioners face is what is the best predictor of job performance. In this case, does using exercise factor scores predict job performance any better (or worse) than a dimension factor scores? To test this question, a multiple regression equation was built for each model. The equations looked like this:

R1xy = ß1(Ex. Factor 1) + ß2(Ex. Factor 2) + ß3(Ex. Factor 3) + ß4(Ex. Factor 4) + ß5(Ex.Factor 5) + error

R2xy = ß1(Dim. Factor 1) + ß2(Dim Factor 2) + ß3(Dim Factor 3) + error

There is a funny thing about regression equations: the more predictors (factors) that are put in, the higher the R2. This is because regression models are fit to the sample data’s peculiarities. This is great for boosting R2, but it tends to be an underestimate of the “true” relationship between your predictors and criterion. Therefore, the R2 test would be biased towards the Exercise Factor model because it has more predictors. However, there is a statistic called Adjusted R2. This statistic removes some of the large model bias from R2 because it accounts for the number of predictors in a model when calculating R2. The results of the Adjusted R2 are shown in the following table (n=133):

Model R2 Adjusted R2 F of Overall Model
Exercise .14 .10 3.32 (p < .01)
Dimension .08 .06 3.11 (p < .05)

These results indicate that scoring this AC based upon exercises may predict job performance better than a scoring method based on dimension. A t-test of the R2 did not reveal a significant difference.

Discussion:

If there is no significant differences in either model's relationship with performance, which one is better? This question may best be answered in the context of your organization. If your managers have been sold on either of the models, then you probably should not change your approach. Also consider the kind of feedback, if any, you are giving to participants. If participants receive feedback reports by exercise, computing scores by exercise may save you from some tricky questions.

Intuitively, people believe that behavior is contextual. However, there is something appealing about the notion that someone who is a good leader will show that behavior regardless of the situation. This data suggests that both approaches are valid, and that scoring schemes based upon either of them are related to job performance. You should use the approach that fits in best with your organization.

References:

Neidig, R. D., Marting, J.C., & Yates, R. E. (1979). The contribution of exercise skill ratings to final assessment center evaluations. Journal of Assessment Center Technology, 2, 21-23.

Sackett, P.R., & Dreher, G. F. (1982). Constructs and assessment center dimensions: Some troubling empirical findings. Journal of Applied Psychology, 67, 401-410.

Gaughler, B. B., & Thornton, G. C. (1989). Number of assessment center dimensions as a determinant of assessor accuracy. Journal of Applied Psychology, 74, 611-618.

Warren Bobrow, Ph.D., is a Senior Consultant with the Context Group. He may be reached at (617) 630-1020, contextwb@aol.com (e-mail), or http://www.contextgroup.com/ (home page).


© Copyright 1996 by the IPMA Assessment Council. All rights reserved.