Anatomy of a Test Vendor

A few weeks ago, I was asked about a certain test vendor. Being a good researcher, I visited their site and was absolutely astounded by the misinformation contained therein. “Wow,” I thought, “this could be the basis of an entire article about what NOT to believe about testing!” Here are some of the areas where I foresaw major problems for employers who chose to use their tests.

Legality

What they state: Their tests have passed a review to ensure a fully legal program. They can ensure users that they will “fully comply” with the new EEOC rules. A personal letter from an attorney states the test does not violate the law. The test complies with all EEOC regulations. What users should know: According to the Uniform Guidelines on Employee Selection Procedures, under no circumstances will the general reputation of a test or other selection procedures, its author, or its publisher, or casual reports of its validity, be accepted in lieu of evidence of validity. Specifically ruled out are:

Assumptions of validity based on a procedure’s name or descriptive labels

All forms of promotional literature
Data bearing on the frequency of a procedure’s usage
Testimonial statements and credentials of sellers, users, or consultants
Other non-empirical or anecdotal accounts of selection practices or selection outcomes

Validation Procedures

What they state: This test vendor states that “some people” believe in construct validity and others prefer criterion validity as the best approach. They further state that “common sense, good business sense, and concurrent validity on a local level” is the best approach. They do this by benchmarking the top one-third and bottom one-third of producers on a local level. What users should know: This is absolute nonsense. Here is a partial list why:

Validity is the process of determining whether a test legitimately predicts job performance, not common personality traits.
Validity is not a “crap shoot” to determine if something, anything, correlates with job performance. It is supposed to be a “validation.”
The Guidelines are clear. They suggest using either content (the nature of the job) or criterion (performance on the job) validity. They recommend against using construct (deep-seated psychological structures) validity.
“Concurrent” is not a type of validity. It is the name given to a real-time design using today’s employees to validate the test (i.e., evaluating current behavior as opposed to predicting future behavior).
How would you divide salespeople into top and bottom performance thirds? By new accounts? By account expansion? By existing accounts? By customer complaints or service? By bad debts? By repeat orders? Give me a break! Dividing employees based on “performance” is pure amateur thinking.
The Guidelines include almost 4,000 words explaining an acceptable process of validation. They include requirements for job analysis, fairness, representative samples, critical knowledge skills and abilities, statistical sampling, statistical procedures, and so forth.
There are more factors associated with job performance than personality. Even people with “good” personality traits can fail in the job because they have the wrong skills.
Use some common sense. Even if one could identify the “high performers,” it is likely they would have different personalities.
If the test was really valid, wouldn’t it be nice to know the candidate’s test was also significantly different from the low group? How about asking the number of individuals in the high group who actually met the high profile or the low profile?

Validity of the Advertised Test

What they state: In a national survey conducted on salespeople and commissions, “All of the above values show that there is at best a weak correlation between profile scores and commissions earned. Scores obtained on a national level would be unsatisfactory predictors of commissions earned.” What users should know: What kind of “spin statement” is that? Sales commissions are the number one indicator of sales performance, and the vendor publicly admits its test scores are an unsatisfactory predictor on a national level? It takes five key sales skills to make a good salesperson (i.e., relating, communication, questioning, presenting, and managing the relationship). If these cannot be measured nationally, doesn’t it seem like this test is looking at the wrong things?

Accuracy of the Test

What they state: Both managers and test-takers generally agree with the results of the test. What users should know: Well, duh! The test-taker just checked a dozen items stating he is determined. Should there be any surprise that his or her test scores were high in “determination”? Besides, are we measuring agreement with the test results or are we measuring future job performance?

Theory of the Test

What they state: The test was developed based on Hippocrates’ 2,400-year-old theory of temperaments. What users should know: Hippocrates also believed that black bile, yellow bile, phlegm, and blood were responsible for all physical and mental health. Furthermore, we have learned a few things about testing and hiring in the last 2,400 years.

Test Reports

What they state: The test uses 60 items to generate a two to fifteen page report. What users should know: Any reasonable person should suspect that 60 questions cannot possibly generate an accurate portrait of human personality. Any test promising that much data about a candidate should be examined for abundant filler, boilerplate information, and marketing general nonsense.

Moving Forward

Although this article focused on one single test site, I suggest the comments apply to many applications that ignore best test practices. There is a good reason why there are entire graduate-level courses devoted to doing a job analysis, developing decent tests, legal testing issues, statistics, and experimental design. This field is complex, and filled with opportunity to 1) hire the wrong people, 2) turn away the right ones, or 3) get your company sued. Want to learn more about how to build and use a test? Go to http://www.apa.org/science/standards.html. Here is an outline of its contents: Part I: Test Construction, Evaluation, and Documentation

Validity
Reliability and Errors of Measurement
Test Development and Revision
Scales, Norms, and Score Comparability
Test Administration, Scoring, and Reporting
Supporting Documentation for Tests

Part II: Fairness in Testing

Fairness in Testing and Test Use
The Rights and Responsibilities of Test Takers
Testing Individuals of Diverse Linguistic Backgrounds
Testing Individuals with Disabilities

Part III: Testing Applications

The Responsibilities of Test Users
Psychological Testing and Assessment
Educational Testing and Assessment
Testing in Employment and Credentialing
Testing in Program Evaluation and Public Policy