Uncovering Test Secrets, Part 2

Validation can get squirrelly fast. Without first conducting a legitimate job analysis and choosing a legitimate hiring test, there is no need to go any further. Everything is worthless without the first two steps. Once that is behind you, establish a strong link between a specific test score and on-the-job performance.

Litigation vs. ROI

Litigation threat has down-the-road implications for developing sound hiring and promotion processes. Attorneys seldom work for free, and you do not have to lose in court to lose money.

I have never personally met an attorney who was experienced in job analysis, conducting validation studies, or documenting assessment processes. I’m sure there are a few, but it’s generally not their specialty. Attorneys are trained to know the law and to argue persuasively, not design, develop, and validate selection systems. The corporate attorneys I have worked with fully appreciate the benefit of a well-documented job analysis, validation study, and assessment process. To quote one of them, “I would much rather defend <this process> than the one we use now.”

Remember, in the litigation world, making a persuasive argument is more important than making a good employment decision. However, in the organizational world, if you don’t make a good employment decision, you get to pay twice: once for litigation and forever for low performance.

No Better Than Chance

We said earlier that people tend to think interviews and tests are two different things. Fortunately, for them at least, interviews fly under the validation radar because most folks think of them as conversations. Unfortunately, once an unstructured interview has screened out the blatantly unqualified, it has a long history of chance-level hiring decisions. Think of it this way: after the candidate has passed on unstructured interview, you might just as well ask him or her to pick a marble from a jar filled with 50 red marbles and 50 blue. Blues are high performers. Reds are low. Unstructured interviews are a significant blind-spot. We all know that.

Validation Designs

You don’t need to know the details, but you do need to know there are different kinds of validation. They include predictive designs where everyone takes a test, the scores are ignored, and job performance is later compared with test scores; concurrent designs where job-holder scores are compared with job performance; individual comparisons; group comparisons; group averages and score distributions; measuring job content; evaluating mental constructs; assessing OTJ performance criterion; examining the face of the test; and, so forth.

I prefer predictive designs using OTJ performance, but organizations seldom have the patience to wait; designs using current employees suffer from technical problems (see range, below); and, defining performance is always an issue no matter what design is used.

Just remember that there is no single type of validation. It varies with application; the number of people involved; the potential exposure for litigation; the importance of the job; and, so forth. One last point: Taking a broad-scope generic test, giving it to all job holders, and developing a high-group norm and a low group norm is not validation. Why? Validation requires a causal relationship. Like hemlines and the stock market index, other than moving together, if one does not cause the other, numbers do not make it valid.

Performance Considerations

Next, there is the problem of “what to measure”; that is, the data used to validate, or verify, the test actually works. This includes hard data like turnover, individual production, new account generation, business expansion, call time, customer satisfaction, and so forth. Hard data is always nice to have, but we have to remember it often conflicts, is part subjective, and part objective. Data taken from performance reviews are usually frustrating (i.e., useless). Everyone tends to look the same on paper.

We may have to make compromises and adjustments along the way, but, if test scores cannot be compared with performance, there is no way to validate the test. You might as well stop testing, buy a jar of red and blue marbles, and save your money. Your employees will be embarrassingly average, but I’m sure there is a sharp litigator somewhere who might be able to make an effective argument for using marbles.

Range Restrictions

Validation is always confounded by the problem of restricted range. Restricted range means differences between high and low performance among job among job incumbents will always be much less than among job applicants (e.g., think of skill differences between pro golfers and skill differences between people in the gallery). Ideally, we want to compare a broad range of test scores with a broad range of performance ratings.

Restricted range plays havoc with analysis because, instead of having the luxury of big differences between best and worse, the analyst must examine teensy-weensy test score differences and teensy-weensy on-the-job-performance differences.

Group-Size Prerequisites

I won’t go into statistics except to say that trust drops fast if the differences are small and there are insufficient people to evaluate (i.e., I prefer 25 people per factor); if you try to measure too many different things with the same test; if the test domain does not actually lead to or affect performance; if both performance and test scores are not normally distributed; or, if the groups are unequal, then you cannot trust your analysis. For example, comparing scores of five Pandas, 25 Penguins, 12 Puppies, 18 Kittens, and three Bunnies might give you impressive looking numbers, but they will be junk.

Assessment Is not a Four-Letter Word

I listened to an excellent webinar last week. It addressed accurate hiring and placement techniques. The presenters cited substantial payoff in ROI measured by turnover, productivity, training, performance, sales, and so forth. I’m sure many C-level executives would give up their Mercedes for a week in exchange for the financial benefits presented, but there were less than 100 people in attendance! And half those indicated they were already using assessments. What’s up with that?

I think the HR community considers assessments in the same class as toxic waste: dangerous, threatening, difficult to handle, and expensive. Well, let’s put that to rest. Assessment is just another word for “measurement,” and, measurement takes place every time a candidate is screened for a job. Don’t forget an unstructured interview is an assessment (although a rather poor one).

The webinar folks argued the same point I have been trying to make for years: accurate (i.e., validated) tests and assessments lead to better hires, and that leads to reducing expenses and increasing revenue. Is it expensive? Compared to what? How would you feel about an ROI of at least 100% within the first one or two hires? And, non-stop payback after that?

Organizational Handicapping

Here are some of the symptoms of using low-accuracy assessments, failing to validate the assessments you are now using, or selecting employees based on demographics instead of individual skills:

Not enough people with the right KSAs for promotion (i.e., increasingly complex jobs require increasingly complex KSAs)
Increased potential for litigation and adverse impact (i.e., decisions are based on personal opinions are less predictive than legally credible validated data)
Frequent training requests to fix “broken” (i.e., unskilled) employees
Weak bench strength limits organizational flexibility and reaction time.

The Final Product

I want to leave you with these thoughts. If test or interview questions are not validated, then it is not possible to know whether they work or not. There is no such thing as an EEOC or OFCCP pre-approved test. There is no such thing as a generic test validated for your industry (unless, in the unlikely event, you can show the two jobs require essentially the same KSAs). And, managers’ home-brewed tests are something to avoid like the plague. Finally, investing in validated tests, interviews, and assessments can yield the single best ROI of any organizational dollar you could imagine.

Can you imagine what it would be like if management considered your department a revenue generator instead of expensive overhead?