Home-grown Tests: Living Dangerously

I’ve written before about frustrated managers designing their own hiring tests. The story goes like this: one or more line managers become so dissatisfied with the quality of new employees they either decide to develop their own hiring test or buy something off the web. Good idea? Nope! Let’s explore a few reasons why.

Discover

The first step in test development is to know which KSA to measure. Is it physical coordination or abilities, staying organized, learning new information, making new decisions, having the right attitudes, getting along with people, or something entirely unexpected? Often, people mistake results for skills. Overall, results are what we expect eventually, but ignoring the KSAs that made them happen is like enjoying a box of fresh Krispy Kreme doughnuts. Those artery-clogging pieces of heaven don’t make themselves. It takes time, good ingredients, the right equipment, a good recipe, and skilled doughnut makers (e.g. the KSAs of delicious doughnuts). Likewise, employees don’t bring “results” to work. They bring KSAs.

Sometimes, if they don’t develop a test themselves, managers might trust a test vendor’s marketing claims. Buying a generic test for a customer service rep, manager, or truck driver might sound simple, but can be as problematic as developing a test on your own. We can all cite examples of jobs with the same title that are very different, as well as jobs that are similar but with different titles. In the worst case, many tests either measure the wrong thing or cannot predict job performance (there is a good reason why the DOL “Guidelines” state that users should not accept vendor claims of test validity).

Predicting job performance requires measuring critical KSAs. If the test (or behavioral interview) shows the candidate cannot solve problems, then the hiring manager can reasonably assume the candidate could have trouble solving on-the-job problems. The same goes for every other KSA. In short, only measure what you need — and measure it accurately. If not, skip it. Let’s never forget the whole purpose of pre-hire screening is to learn enough about a candidate’s skills (not results) to predict performance. Unfortunately, some things simply cannot be measured: unexpected problems at home, health issues, crazed managers, unexpected market forces, and so forth. Those will always be wild cards.

Develop

It’s generally a good idea to avoid open-ended interview questions (the candidate will have had plenty of time beforehand to rehearse good answers). It’s also a good idea to avoid self-reported questions unless you can prove the candidate is either not lying or living in an alternate universe. If you are a skilled behavioral or situational interviewer, you might craft questions seeking examples of when, where, and how the candidate applied KSAs you need for the job. It’s not the most accurate tool, but highly structured interview questions usually provide better data than unstructured ones.

Sometimes home-grown tests take the form of case studies or tests of problem solving, planning, numerical ability, word-association, and so forth. Written tests are not as flexible as structured interviews. On the other hand, they are very hard to fake when professional development standards are followed.

Professional standards require questions that do not overlap; stable scores over time; items that are neither too easy nor too hard; questions of varied difficulty; individual items strongly correlated with overall scores; standardized answer keys; and, proof the test measures what it is supposed to measure. In technical terms this is referred to as test reliability, inter-item reliability, and validity. In lay terms, it means you can trust the scores to be consistent and predict specific KSAs.

For those managers who don’t know what they don’t know about test construction, there is a whole book on it. Ignore its principles and your test will probably be either one big waste of time or a lightning-rod for an expensive legal challenge … and rightfully so. If a manager insists on developing his or her own test, tell him or her to be sure to follow the “Standards” and the “Guidelines” (do a search on the 1978 Uniform Guidelines on Employment Selection Procedures). If he or she thinks these documents are too hard to follow, suggest they document their reasons for ignoring them. Both executive management and the legal department will probably want to know why the organization is using worthless tests and unnecessarily exposing itself to legal challenge.

So what kind of testing works best? It depends on what you want to measure. In my experience it’s a good idea to start with a few highly structured interview questions (if you are skilled in behavioral interviewing) and show a realistic job preview. If you have a large applicant pool, you might even use a smart web-screening app. Follow next with hard-to-fake tests such as attitudes, interests, and motivations, critical abilities, and structured simulations. Save time by arranging your tests so people can drop out if they fail any step. Testing is not limited to written format. Testing also includes interviews. Be specific. One-size-fits-all seldom fits anyone.

Follow Up

Two common mistakes include either not following up or assuming “higher scores are better.” Mistake! This is like buying any-size clothing expecting they will fit. Setting scores is dicey. If they are set too low, candidates will probably be under-skilled and troublesome. If they are set too high, candidates will probably be over-skilled and troublesome.

If the test measures mental ability, it is likely that some demographic groups will fail the test at a higher rate than others (e.g., brace for adverse impact!). If the test measures something that could be learned in a short period of time, you might be asked to show why potentially acceptable candidates were excluded. Sometimes score-setting is as challenging as test development.

For example, one of my clients decided to set scores lower than necessary just to get more protected-group applicants into their training program; and, another, after using a home-grown spelling and grammar test (e.g., with an overall passing rate in the high 90’s) wondered why new hires could not “spelle good.” If tests (and interview questions) are supposed to predict performance, verify they work.

Recent OFCCP and EEOC audits show the government is becoming more sensitive to criterion validity issues. That is, employers have been increasingly required to show legitimate links between test scores and on the job performance. Not to mention the fact the current administration has increased both the size and budget for adverse impact investigation and enforcement — much of which seems to be politically based.

Hiring the best people is a lot of work, but in the end you will have stronger legal credibility, lower turnover, higher individual productivity, and less training. In addition, your workforce will not only be diverse; every member will be highly qualified. Home-grown testing is like buying expired food. It might seem like a quick way to save a buck, but it always proves to be a big mistake.