Good Test? Bad Test?

Get used to it: unless your organization hires everyone who applies, you are testing. Some people (even attorneys who should know better) vigorously deny that their organizations test applicants (pssst?interviews are tests!).

Whether an organization uses verbal questions or written questions, they both have the same objective: to separate qualified applicants from unqualified ones before spending big bucks on salary, benefits, and potential lawsuits. Tests are tests.

Now?let’s discover whether your test is working for you.

A Good Test

Separating a good test starts with reliability. Suppose an applicant takes a test on Monday and on his way out, you deliver a carefully aimed blow to the head sufficient to cause short-term memory loss (but not permanent damage).

After he gets out of the hospital, you invite the applicant back to take the same test a second time (with the promise of safe passage). Will he score roughly the same? That is, can you trust the scores to remain consistent from one time to the next?

This is called “test-retest reliability.”

Reliability means you can trust a test to deliver similar scores regardless of when it was taken. Otherwise, you would never know whether it was accurate.

Interviews, for example, are notoriously unreliable. Interviewers tend to like or dislike applicants; they may ask different questions of different candidates; they may think the objective of the interview is to get to know the applicant (wrong answer!); they tend to rate applicants based on personal appearance; and sometimes interviewers just talk about themselves. Interview test-retest reliability is pretty low.

Reliability is not limited to interviews. It also applies to many popular tests used in training, especially ones that measure personality type. Type-tests are fine for workshops and communication classes, but even some of the most popular ones are filled with reliability problems. Independent reliability studies show scores from a popular four-letter type-test tend to change from one time to the next. So, test authors, which score is the “real” score? The score on Monday? Tuesday? Last month?

Before you subject any applicant to a test, examine the vendor’s manual carefully and search for a section on “reliability.” You want proof the vendor knew enough to study the reliability of:

Each test item (item analysis).
All test items (inter-item reliability).
The first test half compared to the last half (split-half reliability).
The same people at two different times (test re-test reliability).

If you cannot find any reliability data, then your favorite test scores probably change from day to day. The next time you buy a pound of cheese, wouldn’t it be nice to know you were really getting the weight you paid for?

Hopefully you see that unreliable tests are a dead end, especially since most organizations want their tests to predict performance.

Using Test Scores for Prediction

Predicting job performance means that a reliable test score is directly related to job performance. The word “directly” means two things. First, it measures something that affects job performance. Second, the scores correlate with ratings. A typing test, for example, is clearly linked to jobs that require keyboard skills. If your organization still has a typing pool, the scores probably indicate the amount of work a typist can do.

But are keyboard skills always linked to job performance for management? Should we fail candidates who could learn keyboard skills in a few weeks or months? Do we know if applicants are physically unable to operate a keyboard?

Accurate prediction is called “validation”?and if you thought reliability was complicated, you ain’t seen nothin’ yet! Validation requires knowing clearly what skills are necessary for the job, and doing sufficient analysis to show test scores are statistically correlated with job performance (i.e., the test content and job requirements are causally related).

Otherwise, you are predestined to turn away qualified people and hire unqualified ones. Is that wrong-headed or what?

Why Should I Care?

If your objective is finding and filling, then you probably don’t. Stop right here, get some coffee, and don’t send me any nasty-grams. I assure you reading this article will be a colossal waste of your time.

However, if cutting turnover in half, doubling individual productivity, reducing training expenses, and building a solid base on future-qualified employees is attractive, then you need to know this. These claims are all normal for an organization that uses reliable and valid tests. Why? Their tests screen-out unqualified applicants. In case you are wondering, only about one applicant in six (on average) can pass a series of validated tests. Put another way, only about one applicant in six can demonstrate skills required for the job.

Ever hear about the 80/20 rule?the one where 20% of the people produce 80% of the results? It’s amazingly close to a one-in-six hiring ratio. Think about it. So if you care about making the biggest splash ever in the company pool, then continue reading.

A Bad Test

A bad test is one that an organization uses consistently, is backed by folklore and plenty of personal anecdotes, but has never been critically evaluated. Bad tests usually come out of corporate training programs. That is, a workshop participant who answered 10 questions about being a thorough planner was “amazed” when the test reported he or she was exceptionally organized. Next step?.use it for hiring!

Folks, personal agreement with test scores is not a reliable and validated way of predicting job performance. It is only a summary of how someone describes himself or herself. It is a self-reported description. Is the person actually as organized as he/she says? Or are they faking? If they are not faking, is organization important to job performance?

Defining the Job

This is a tricky area. The secret is to define the critical skills that directly affect job performance. This might include learning ability, problem-solving skills, persuasiveness, and so forth. The key to defining job requirements is to identify behaviors leading to job success or failure. It sounds weird, but you don’t look for results, just the behaviors that lead to the results.

If you cannot clearly define the key job skills, then there is nothing to test. The 1978 Uniform Guidelines suggest job competencies be based on job requirements and business necessity. I don’t know about you, but that sounds pretty good to me. Amazing! The government recommends organizations test for job requirements and business necessary. If anyone out there can suggest something better than basing a test on job requirements and business necessity, I’d like to hear it.

To reiterate, your test first has to be reliable. Then you must know what to explicitly measure. To make sure the test works, determine whether test scores predict job performance. We call this step “validation.”

Throw It On the Wall and See What Sticks Approach

Here is a sure clue to wrong-headed hiring practices. It goes like this. A vendor has a general personality-style test (we’ll make a fanciful assumption that it passes professional reliability standards). The vendor herds high producers into one group and gives them the test. He examines the averages and declares, “Yea, verily, these scores doth become our target!” (vendors like to use old English?it sounds so classy!).

Whoa?not so fast.

How does one define high-producer? By results or by actions that lead to results? It makes a big difference. Individuals in the high-producer group could have used different skills to get there. Some might be good politicians. Some might be very smart. Some might be taking credit for others’ work.

What about the confusion between correlation and causation? Just because ice-cream sales and shark attacks are correlated does not mean that one causes the other. Almost anything can be correlated, but not everything is causal. If you sort through enough garbage, you are likely to find correlations between cookie wrappers and hotdogs. So what? Your goal is to find a correlation between hotdogs and hotdog buns.

The “see what sticks” approach has a few natty problems. Sure, it looks scientific, but what good are decisions based on wrong-headed performance criteria, wrong-headed clustering techniques, and wrong-headed statistical analysis?

Job-Match Approach

The job-match approach is scientifically similar to the “see what sticks” approach, except worse. Some types of tests say certain occupations have similar styles: Introvert Sensing Thinking Judging (ISTJ) for example.

Before you use this stereotype for hiring, ask yourself if all the people in the same occupation do the same thing, or do they all perform equally well? Did their personality style cause them to be an engineer? Are these folks extreme ISTJs or are they marginal ISTJs? Do their organizations all have the same objectives for the job?

Conclusion

Everything starts with the human elements of job requirements and business necessity. Human elements are seldom included in job descriptions or job evaluations. You have to dig for them. If you cannot test/interview for specific human elements, your tests will probably be inaccurate.

All selection tests have to pass rigid standards for reliability and validity. Reliability means the test delivers consistent results time, after time, after time. Validity means the test scores accurately predict job performance and should be done carefully.

It is a grave mistake to assume any group of performers has equal skills. For example, some salespeople are great repeat sellers, some are great cold callers, and others are great service people.

They all might be high performers but for entirely different reasons. It is a big mistake to assume characteristics or traits correlated with performance actually cause performance.