Web-Based Hiring Tests: Do They Deliver?

The phone rings. Someone on the other end says he or she wants to build (or buy) a Web-enabled hiring test. Let’s say it will be for salespeople (generally the caller is a recruiter or HR manager, but sometimes he or she is a gopher).

After discussing the idea for a few minutes, I make a few suggestions. These always include following the ‘Guidelines’ to make sure the test is based on job requirements and business necessity and following the ‘Standards’ to make sure the test actually predicts job performance.

In almost every case, the caller is aghast at the work that needs to be done.

“All I want is a test!” they say.

“You want one that works?” I ask.

“Yes. But that’s hard!” they say.

“What’s your point?” I say.

On the other end of the phone, my keen bat senses hear muttering about me being a “jerk,” then dialing someone who will sell them the “mother of all tests”?one the vendor promises will work, regardless.

Why All the Fuss? A Test is Just a Test, Right?

The Guidelines and the Standards are not “nice to know” (i.e., limited to eggheads, legal eagles, and companies with U.S. operations). They describe how to define and evaluate job skills. That is, they first recommend test users define critical elements of the job based on job requirements and business necessity; then, they describe three ways to make sure test scores accurately predict performance (e.g., criterion, construct, and content validation).

Let’s reiterate. Step 1: Define job requirements and business necessity. Step 2: Make sure the test is predictive and stable.

Clear definition and evaluation is good for the hiring organization and good for the applicant. This principle works in all cultures and countries. So, if you plan to use a Web test, it’s a good idea to know the test actually separates qualified applicants from unqualified ones.

If you don’t get anything else, patch this into your screensaver: the only people who think it is too much work to follow best practices are people who don’t know how to do it.

But, no harm is done, right? Wrong. Highly effective hiring tests that claim they have no adverse impact or have been “validated” by the U.S. EEOC are as legitimate as the email announcing you won the lottery in Botswana.

Bad tests are really bad news for employer and applicant alike. A bad product backed by good-sounding marketing claims is still a bad product. And whether the user is in the U.S. or not, the test consumer, not the vendor, lives with the consequences of test use!

So, even if the vendor claimed his test was validated to grow hair on bald applicants, transform ugly employees into movie stars and cure morning breath, it would be your problem, not the vendor’s, to prove it.

Cause and Effect

There is a good reason why sailors advise passengers not to spit into the wind. The same is true for feces, fans, and bad tests. Eventually, even clueless test purchasers learn a weak test does not work as promised. You see a test that is not based on job requirements and business necessity, nor validated for the specific job, is designed to pass too many wrong applicants and fail too many right ones. It will show up on the job. That’s why the Guidelines and Standards are so valuable: they define exactly how to identify, qualify, and use a test that contains the least amount of error.

The bottom line is no matter how many years a person has been a recruiter; no matter how smooth his or her marketing campaign; no matter how certain he or she is about being a recruiting expert; and no matter how famous their organization, the ‘Guidelines’ and ‘Standards’ set the bar for measuring job skills.

Let’s examine how the ‘Guidelines’ and ‘Standards’ work for a sales position.

Sales Hiring 101

First, any method of separating qualified from unqualified applicants is a test. And “assessment” is just another word for “test.” We assess resumes, application forms, and applicant skills. The vast majority of organizations, unfortunately, use a two-step assessment process. Step one: use an interview to screen out most of the riff-raff. Step two: let the job screen out the rest. The two-step process explains in large part why 20% of salespeople generally produce 80% of the sales. Only riff-raff were screened out pre-hire.

Screening out riff-raff is easy. All you have to do is get to know the applicant, examine earnings statements, and dislike his or her personality. Normally, organizations screen-out 3.5 applicants to get one promising employee. On-the-job performance screens another one of two. Over time, this makes the final hiring ratio about 7 to 1. Riff-raffing is the norm and riff-raffing is expensive.

Let’s look at the cost of using the job as an assessment in terms of training, travel expenses, management coaching, and salary for six months. We’ll be conservative. One week training = $2,500; sales travel expenses = $100/day for six months or $12,000; coaching time = 15% of manager’s time or about $6,000; and, six months’ salary and benefits = about $36,000. This totals about $56,500 per salesperson (ignoring recruiting fees, lost customers, empty territories, and so forth). Bottom line? In round numbers, riff-raff assessment costs upwards of $50,000 for each lost salesperson.

Error-Free Hiring?

Mistake-free hiring is pure fiction, but doing a better job screening is not. First, you have to fully understand your specific sales job and the critical skills that separate the successful from the unsuccessful salesperson. This kind of information is seldom obvious. It does not come from generic tests, averaged scores, and calculating group norms. Generic norming is bad science. It serves as an example of wrong-headed test practices.

A trustworthy and reliable test involves in-depth understanding of critical job functions, measuring every critical skill area at least twice, doing a formal study to confirm scores predict job performance, and monitoring adverse impact. In professional terms, this is called job analysis, validation, multi-trait-multi-method assessment, adverse impact monitoring, and continuous improvement. If it sounds like a good way to do business, it is. If it also sounds like hard work, it is.

In the next few paragraphs I’ll briefly describe what to look for in a sales selection system.

Professional Job Analysis

As mentioned above, a professional job analysis does not consist of giving everyone a questionnaire and comparing top-performer scores to bottom performers. This is the first sign of buyer-beware because it makes some huge and often wrong-headed assumptions.

It assumes an equal playing field. That is, all productivity results are equivalent. New accounts, customer service, market conditions, and expanded accounts are all rolled-up into the same category: productivity. In some cases, overall performance might even be complicated by (gasp!) skillful manipulation of numbers. Separating salespeople into top and bottom producers based on sales dollars is a sure clue the analyst does not understand sales.

Suppose you are like most folks in the hiring business and you expect your test to accurately predict job performance before you commit big bucks to salary. By definition, your test should measure something that causes performance. If you give one big test to everyone without knowing explicitly what you want to evaluate, you fall into the “correlation or causation” trap. As an example, ice cream sales and shark attacks have a strong positive correlation. Does that mean shark sightings cause people to eat more gelato? That Ben and Jerry’s Chunky Monkey is a poor shark repellent? Or perhaps sharks have a seasonal business they don’t want people to know about? Homegrown questionnaires often confuse correlation with causation. Just remember: Unless water-born ice-cream is proven to attract sharks, one does not cause the other.

A good job analyst knows how to identify key skills that make the difference between successful and unsuccessful cold calling, repeat sales, strategic selling plans, customer service, and so forth. In many cases, they may involve totally opposite skills. Treating sales production as a discreet measurement point is like putting fruit salad in a blender, pressing the annihilate button, and testing the puree for peaches. A professional job analyst knows key information can only come from people doing the job, not from supervisors or aggregated production data.

Let’s say the analyst has done his or her homework. Now what? The hiring manager does not have weeks or months to evaluate applicant skills. Unless the hiring manager uses the hire-and-hope strategy, sales skills have to be evaluated in minutes or hours. If we have done our job right, we will know the mini steps that lead to maxi results.

Bottom line? If the analyst asks you to lump producers into groups and gives them all the same test, you are about to see your money pour out the door.

Does the Test, Test?

The only test that is worth anything is one that works for your job in your company, not one that worked for the company across the street, or a job with the same title, or matches a nationwide norm, or even a company in the same industry. It has to work for you.

Sometimes a validity study can be transported from one job to another, but that is only if you know for certain the two jobs are essentially the same. But if the market is different, the company environment is different, products and services are different, customers are different, or sales cycles are different, then how can any reasonable person claim XYZ scores predict cold calling, customer service, or sales expansion for your position based on one that is entirely unknown? Doesn’t that seem a little far-fetched to you?

The only time you can trust that another test will work best for your organization is to compare the job analysis from the other test to the job analysis for your job. If the two jobs are essentially the same, then use it; if not, you “pays your money and takes your chances.”

Give a generic personality test to salespeople and see what shakes out? Get ready to see a great big pile of belly-button lint.

Our Test Does Not Discriminate

In the U.S., at least, large organizations and Federal contractors are not supposed to reject qualified applicants based on age, gender, race, and so forth. This is called discrimination; but there is something else called adverse impact. What does adverse impact have to do with discrimination?

The legal definitions have subtle overlap, but for the purposes of this article, let’s assume discrimination generally means that an organization intentionally discriminates against certain kinds of job-qualified people?in hiring, promoting, training, and so forth. While adverse impact generally means the hiring system, even though it is job-related and professionally validated, unintentionally discriminates. In lay terms, think of discrimination as intentional and adverse impact as unintentional. For any better definition, see your local labor-law attorney to explain the details.

I consider discrimination unethical. Everyone deserves a chance to work in a job for which he or she is qualified. But here is where things get complicated. Government agencies examine discrimination at the group-level. Hiring managers don’t care much about group performance. They care about individual performance.

This raises a problem that all hiring professionals need to consider. By way of example, suppose 200 people apply for a job. One hundred are Lilliputians and 100 are Yahoos. At the group level, 70% of the Lilliputians are hired, while only 40% of the Yahoos make the grade. At the individual level, there are quite a few Lilliputians who are miserable workers, just as there are quite a few Yahoos who are top performers.

From the organization’s viewpoint, they only hired job-qualified people. From the government’s viewpoint the company discriminated against the Yahoos.

Who’s right? It’s hard to tell, so the government examines the organization’s:

Professionally developed job analysis (to show hiring tools are based on job requirements and business necessity)
Professionally conducted validation study (to show hiring tests and interviews accurately and consistently predict performance)
Pass and fail results for Yahoos and Lilliputians at each step of the hiring process
Proactive efforts to develop tests with less adverse impact on Yahoos

As long as the company has done its homework and followed generally accepted hiring practices as outlined in the “Uniform Guidelines” and “Standards,” it is not in trouble and will have hired all the best and most diverse applicants.

So what’s the problem? Some vendors claim their tests have no adverse impact. But research consistently shows hiring tests for jobs requiring problem-solving ability almost always does have an adverse impact when examined on a group level. Competent test vendors know this. Incompetent ones don’t.

Automated Resume Screens

What recruiter or hiring manager has not seen a brilliant resume developed by a blatantly unqualified candidate? And what about the marginal resume presented by a remarkable applicant? At best, a resume includes Kodak-moments recalled by the resume writer. At worst, a resume is an exercise in creative fiction.

Think about it. Every applicant is motivated to write just enough words to garner an interview. While every hiring manager wants to find someone who was an exceptional performer in the exact same job at another company doing the exact same work. Generality goals meet specificity objectives.

Sophisticated applicants know how to pepper the resume with keywords and qualifications that may be fact or fiction; different hiring manager’s screen resumes using totally different criteria for the same job; and, everyone makes massive inferences based on snippets of data. So. Tell me again. Other than keeping a few programmers in work, what is the benefit of automating resume searches?

Back to the Beginning

So here we are, back at the beginning. Tests are abundant. And if all you want to know is a score, anyone test will do. Good tests, however, ones that accurately predict job performance, are rare. You can trust a good test to produce good employees. You can tell the difference by following a few guidelines.

Avoid vendors that emphasize their non-discrimination aspects, “legality” or industry-wide applications. Assuming their claim is accurate (and I have yet to see one what was) users are responsible for their own test use. Vendors are off the hook.

Avoid vendors that want to give their test to two groups of producers and use the results to predict job performance. These represent bad science. Scientifically, this kind of study can only show whether the two groups are different, but it does not tell you why. And it does not tell you about individuals within the groups.

Avoid tests that are based on self-reports. Self-reported answers can be faked. They cannot be validated by outside sources. Self-reported tests are similar to resumes. They represent things the test-taker wants you to know about him or her. Making decisions about hard skills based on self-reported data requires a huge leap of faith that is generally wrong half the time.

Ask the vendor for a report showing he followed the ‘Guidelines’ and ‘Standards.’ This is your only assurance the test will be job related, based on business necessity and accurately predict job performance.

Web-based testing is in the same category as medicine was 100 years ago when heroin was good for you; there was no such thing as anesthesia; injections were unavailable; radioactive water cleared the mind; opium was a relaxation agent; blood-letting was commonplace; linseed, mustard, and soap were used as cure for infection; and sugar of lead was a common treatment for diabetes.

Let’s all work hard to move hiring into the 21st Century.