The Wall Street Journal recently posted an audio weblog describing the hiring practices used by a well-known Internet service provider. Apparently, some people believe this provider is setting some kind of hiring example.
However, after listening to the recording, I think it is a better example of a hiring process that sounds good on the surface, but contains some serious flaws. Here are their talking points:
- The company hired about 4,000 people in the last 18 months
- They want to hire people who are smart, fast learners, collaborative, curious, and love solving problems
- The company averages six to seven interviews per candidate
- All finalists are reviewed by senior management and a hiring committee
- They administered questionnaires to develop job requirements
- They administered a 300-item questionnaire to employees, are comparing scores with 30 to 40 measures of performance and plan to use the results to identify five to 10 key applicant questions
Sound good? Well, let’s look at this process through our “squinty-eyed-scientist” lens.
The company hired 4,000 people in the last 18 months and expects each new employee to be smart, a fast learner, collaborative, curious, and love solving problems. Looking at these goals from a whole-job-whole person, this description only covers two of the four critical job competency clusters.
To be complete, the company needs to expand the profile to include planning/organizing ability, additional interpersonal skills, and a few more motivational aspects. In addition, it needs to clarify how-much of each skill is necessary for each job-type (managers, for example, need broader and deeper cognitive abilities and better coaching skills than job-holders).
As a side note, the overall success it has enjoyed in the past often comes from having a good product and being at the right place at the right time. However, flush with newfound success, employees (many of them newly wealthy) forget the right-time-right-place effect and wrongly take full credit for their success. As a consequence, they are quite surprised when market forces fade (e.g., who can forget the effects of the dot-com bomb?).
Employees riding the crest of a market wave often fail to consider that even a chicken can travel 60 mph on the way to the processing plant! It’s a good time for management to develop some humility. Soon, the market swell will be gone, and high performance will demand more and better employee skills. (If not. Can you spell l-a-y-o-f-f? Of course you can.)
Hiring managers are generally smart enough to recognize that everyone who passes an interview does not live up to expectations. They are also smart enough to realize that two interviewer heads are better than one. But six to seven interviewers, the executive management team, and a hiring committee is total overkill. It’s a great way to diffuse responsibility for bad hires, but the bright folks at this company should seriously consider the law of diminishing returns.
Interview effectiveness is entirely another matter. I don’t know what kind of interviews they conduct, but the most predictive interviews have a high degree of structure. That is, interviewers are highly trained to probe for examples, use questions derived from a job analysis, and have standardized answer keys. Usually three interviewers are enough.
Hiring managers, in my experience, tend to be a maverick group. They often ignore structured questions, venture into questionable areas, and otherwise do their own thing. I get much better results when all interviewing and testing is done by a centralized staff and hiring managers are only involved for a chemistry check.
The only reason why I can imagine involving the executive management team is when they have a lot of free time and are looking for something to do to stay busy.
Using questionnaires to determine job skills is a challenge. For one thing, not everything listed will be equally important, occur with the same frequency, or will be considered important by management. I’ve used both and found that either job-holders tend to consider everything important or the questionnaires miss critical information.
It is much more difficult and time-consuming to interview job-content experts, but I find only a skilled analyst can sort out details that a survey would miss. Unless the job is strictly by the rules (i.e., law enforcement or firefighting), questionnaire-based job analyses tend to get very messy.
Five to 10 Items?
At first, a 300-item bio-data survey seems like a good idea. After all, who can tell us more about the job than the people who do it? But consider this: the majority of people completing your survey have already survived the “on-the-job” test. That is, they might have differences in performance, but they are not big enough to get fired. This is called “restriction of range.” It means that test results from current employees are likely to have smaller differences than test results from applicants.
To be effective, a test has to separate one applicant from another. Applicants have the greatest difference from one person to the next, and applicants are the people who we want to test for future job performance.
Can-Do v. Will-Do
A past reader commented that her company once did a concurrent validity study of financial service managers that showed motivation accounted for about 50% of the difference in performance. I have not seen the details of this study, but let’s assume everything was done by the book.
Job performance is part “can do” and part “will do.” If everyone in this study was employed, they probably had similar “can-do” skills. So what’s left to measure? The “will-do” part of the job. Can “will-do” affect half of job performance? Sure. But only when other things are held constant.
But can we assume “will-do” is a great predictor of job performance among job applicants? Probably not. Applicants have often worked in other companies, for other bosses, and maybe with different products. If the applicant aggressively pursues the job, he or she will probably say or do almost anything to get hired.
I won’t quote research in this article. I’ll just ask which system the reader thinks makes more sense: hire people who score high on motivation and assume they are job-skilled; hire people who score high on skills tests and assume they are job-motivated; or hire people who score high on motivation tests and high on skills tests?
Article Continues Below
Sneak Peek: 2022 Recruitment Marketing Benchmark Results
It’s time now to examine validity. “Validity” is whether the test works (i.e., whether a high or low score predicts high or low performance).
Burn this into your brain, “No one with an ounce of professionalism will promise that a test will accurately predict job performance based on an industry model. Why? Consider the following:
- The people in the same model are probably a mix of high and low performers
- Jobs with the same title may require different skills
- All organizations are not alike
- Averages hide individual differences
- “Performance” may not have the same definition from one organization to the next
- Just because the applicant does not match the model does not mean he/she will be a low performer
- Just because the applicant matched the model does not mean he/she will be a high performer
You get the idea. Industry model-matching sounds good, but it is usually junk science!
Sure, there are occasions when there is a benefit from using someone else’s validity study (i.e., validity transportability) instead of doing your own, but it is still incumbent on the test user to show that “your job” is essentially the same as “their job.” But how often do you think this is done?
Suppose, before enrolling in WeightWatchers, Aunt Bertha decided to weigh herself. After digging your bathroom scale out of the vinyl tile, you realized the springs would never be the same. One minute the scale would read “400 pounds” and the next it would read “50 pounds.” Reliability means you can count on a score being the same from one time to the next. A bad test is like a broken scale. It contains built-in error.
Good employment practices control for reliability by using a “multi-measure” strategy. That is, different tests are used to evaluate the same trait. If all the tests agree, then the trait is probably accurate. If they disagree, then something is wrong.
Within a single test, a designer will usually use from seven to 12 individual items per factor. Below seven items, the score becomes unreliable because each single item has a disproportionate impact on the overall score. More than 10 to 12 items become redundant (e.g., how many ways can we ask whether someone is a hard worker?).
No matter how fancy the study, if our Internet service provider expects a five- to 10-item test to comprehensively measure several job traits, they will quickly learn the concept of Aunt Bertha’s scale.
Whole Job-Whole Person
This part of selection is called “multi-trait.” Jobs are complex. That’s why, after many interviews and meetings, hiring managers often find themselves still undecided about whether an applicant can do the job.
A good hiring and promotion system is able to effectively break apart the cloud of confusion that surrounds the actual skills required to perform a job. For example, do we predict job performance based on what a person has done in the past, what they promise to do in the future, or the skills they demonstrate right now?
Whole job-whole person refers to digging deeper into actual job competencies and determining exactly what applicant skills can be measured in real time and what skills cannot. Interviews are unable to do this.
Hiring criteria run the gamut from “Y’all come” to get-to-know-ya interviews, to magic-question interviews, to highly structured interviews, to interview/test/simulation combinations, and on-the-job exhaustive tryouts. Thousands of professional researchers have already evaluated and identified best-of-class methods that predict job performance. In lay terms it goes like this: something is better than nothing and the closer the “test” is to the job, the more predictive it tends to be.
People do not have to reinvent the wheel. And bigger organizations are not necessarily smarter than smaller ones.