Distinguishing Between Assessment Science and Snake Oil

One of the great things about reviewing online staffing assessment tools is seeing how psychological science and technology come together to positively impact people’s lives through better hiring decisions. Better hires lead to happier, more productive employees. This in turn leads to more effective, profitable companies that result in stronger economies, more stable societies, and a generally better world. But given the benefits of online assessment, we find it troublesome that so many companies do not take advantage of these tools. Our research suggests that the slow growth in the use of assessments is due to three main obstacles:

Many staffing professionals do not understand how online assessment tools work and are either unaware or skeptical of their value.

Not all assessment tools work as advertised, and some of them may not work at all.
Staffing assessment vendors are doing a lousy job communicating the difference between assessments whose value has been proven through rigorous empirical research and those whose value is based solely on marketing hype and lots of multi-syllabic, pseudoscientific jargon.

Obstacles #1 and #2 will probably never go away completely. First of all, assessment science is complicated. A well-designed personality assessment that takes 30 minutes to complete can accurately predict how an employee will behave two years later. But designing a measure with this predictive power is not a simple task. The best assessment tools are based on decades of research involving data from thousands of jobs and hundreds of thousands of employees. It is unrealistic to expect staffing professionals to spend the time required to fully understand the intricacies of this research. Second, as long as there is money to be made selling staffing tools, there will be people willing to sell tools that do not work. Most vendors we work with evaluate their assessment tools using a high degree of scientific rigor to make sure they work as promised. Still, this is not the case for all of the vendors we have come in contact with. Obstacle #3 reflects the most promising way to increase the use of staffing assessments in the broader marketplace. Simply put, assessment vendors need to do a much better job providing clients with clear, objective evidence that illustrates the effectiveness of their tools. This requires showing more hard data demonstrating that their assessment tools work as advertised. In the language of assessment science, this data is called “empirical validity.” Empirical validity comes from comparing scores on an assessment with measures of actual employee performance. When it comes to staffing, empirical validity is what really matters. Do not be confused by vendor claims of construct, content, or face validity. While these forms of validity are important, they are trivial compared to empirical validity. Most staffing assessment vendors do a poor job providing evidence of empirical validity. We re-visited the websites of roughly 20 assessment vendors, looking for solid, empirical data demonstrating that their tools work as promised. Of the sites we visited, about 50% contained no hard evidence indicating that their tools actually predict job performance. At best, these sites provided vague client testimonials about the value of their tools or simply claimed that their tools are “scientifically validated” or “proven solutions.” What was particularly frustrating is that we personally know that some of these vendors have masses of empirical validity data for their tools. However, their marketing departments appear to have decided that information demonstrating that their tools actually work is best kept hidden from potential clients. Around 50% of the websites did provide some empirical validity information. This information ranged from being very clear and relevant to being so confusing it was almost comical. The first two examples represent some of the better empirical validity data we found on assessment vendors’ websites. We have removed information that would identify the vendors, and shortened the examples for the sake of brevity. These examples provide the kind of information clients should demand from assessment vendors before engaging them in further contract discussions:

Example 1: A concise summary of empirical validity information “A large international call center organization with locations in over 32 countries had a very high turnover rate. They needed a pre-employment testing tool that would help them identify those individuals who had a productive attitude, were good at persuasion and diplomacy, and liked to work in a structured environment where performance is closely monitored. A complete job analysis was conducted and a large group of call center agents were tested using a custom-developed test battery. The newly developed survey was added to the selection process and launched in all locations. Within four years, turnover had significantly dropped from double digits to 4%. Equally important, productivity also improved. Booked revenue increased significantly over the 12% annual increase goal to no less than 20% on a consistent basis each year.” Example 2: A general summary of empirical validity information. “Studies have shown a direct correlation between assessment scores and on-the-job performance. While high scorers tend to be more productive and dependable and to stay on the job longer, low scorers are more likely to be difficult and unreliable. People who failed the assessment were:

(in a retail chain) 80% more likely to “let joking friends be a distraction and interruption to work”

(in a discount chain) 82% more likely to “take an unauthorized break”

(in a retail chain) 55% more likely to “use a weak excuse to stay home from work”

(in a discount chain) 56% more likely to “cheat on a timecard by punching in before actually starting work”

The next two examples show some of the more questionable validity information we found on vendor websites. In fairness, we chose these examples to make a point; other parts of these vendors’ sites may contain better information. We are not suggesting that these vendors do not have evidence demonstrating the empirical validity of their tools. However, if they have such evidence, they are not doing a good job communicating it.

Example 3: Huh? “One consistent factor which we have found and have continuously pointed out to clients is that the value patterns that indicate success and, thus, can be validated as predictors, vary between companies, within companies, and within performance areas. Unless the diagnostic patterns measured by the assessment are empirically correlated to those factors that measure success for a specific performance function in an individual company, and in an individual geographical location for that company, the information cannot be reliably used to predict who will and will not succeed.” Example 4: If you can’t blind ’em with science, baffle ’em with… “Unlike typical evaluations that are based purely upon a psychological categorization, our assessment is derived from a unique synthesis of brain physiology, Thurstone’s paired-comparison methodology, and the Gestalt Success-Satisfaction framework. Our integration of these methods and assumptions has produced a unique system that goes beyond right-brain/left-brain modalities.”

If you ask an assessment vendor for validity evidence and their reply contains a lot of scientific jargon and very few actual numbers, it’s time to start looking for some other vendors. Every staffing assessment vendor should be able to provide summaries of empirical validity studies that have been conducted to test the value of their tools. These summaries should include the following kinds of technical information:

The types of jobs used in the studies, and the countries where the studies were conducted
The number of employees or candidates used in the studies
Whether each study was predictive or concurrent (i.e., whether the data was collected from current employees or from candidates before they are employed)
The performance criteria used to validate the assessment (e.g. supervisor ratings of performance, tenure, sales performance, etc.)
The corrected and uncorrected validity coefficients between employees’ scores on the assessment and employee performance criteria
If relevant, whether the assessment validity was cross validated against a hold out sample, and, if so, the validity coefficient in the holdout sample
EEOC statistics indicating potential differences between different demographic groups included in the studies

This information may not make a lot of sense to someone who is not fairly well versed in assessment science. However, any assessment vendor that claims the title of being “scientific” should be able to provide this sort of information. If there is no one in your organization who is able to make sense of validity information, you may want to consider consulting a qualified industrial/organizational psychologist. The money you spend enlisting an outside expert is likely to be negligible compared to the cost of implementing a bogus assessment system. We dream of the day when empirical validity information is readily provided by all assessment vendors using a standard format and link on their websites. Presenting such empirical validity information would go a long way to help clients understand, evaluate, and promote the use of staffing assessments. Furthermore, vendors that are unable to provide this information could be quickly recognized as potentially selling more snake oil than science. As the marriage between assessment tools and hiring technology continues to bear fruit, the role of empirical data will become increasingly important. The assessment systems of the future will rely on streams of real-time data to help companies gain precise metrics regarding many different aspects of their hiring process. As the value of data collected during the hiring process increase, it will become harder for vendors of bogus products to pull the wool over the eyes of their customers. Until this time, the mantra should remain, “buyer beware.”