Bias in Recruiting AI, and How to Beat It

There is a lot of talk about the way AI will influence hiring. There is a lot of discussion if it should, since there is great risk. And we’ve seen some of the risk, like with Amazon’s algorithm that simply downgraded female achievements.

Recently, Christoph Fellinger, head of strategic talent acquisition at German giant Beiersdorf, best known for its NIVEA brand, said at Talent Acquisition Live in Amsterdam that the reason that AI is terrible at selection people is twofold. First it tends to look at the wrong data. Second, it’s trained on biased data. I fully agree with him on both counts, yet not in the way he meant it.

Wrong Data

Christoph said AI looks at all data, including the non relevant, and cannot put that into context. It does. The problem is, so do humans. Unconsciously. So even though nobody, or at least very few people, will ever consciously say a women is per definition less than a man, unconsciously code we know is written by a woman is valued less. And this is just one of several pieces of research done on the subject.

I do agree with Christoph that it looks at the wrong data, because it tends to look at a resume, just like a recruiter does. The problem is there is yet a single scientific research to be done that actually even finds a correlation between a resume and performance. So when we have an AI trained on resumes, maybe we’re training it on the wrong data to start with, data with no predictive value.

Biased Data

The second part of the problem is that all HR data is biased. Humans are biased. And even society is biased. Are men genetically better developers? I don’t know. I do know that when a little boy is bad at math, he gets told to try harder. A little girl hears a consoling: “it is hard, isn’t it’?”

So is the data biased? Yes, by definition. It’s impossible to train an AI to be unbiased based of current HR data. However, there are ways to still train AI.

Steps to Train Recruitment AI

The first step is that we need to acknowledge is that every job at every company is different. There is no quick fix. We can use building blocks from others, but you always need to calibrate for your job(s) at your company. A great salesman at company A can easily fail at company B. There are few people that succeed everywhere. That’s because of company culture and teamwork.

For the second step we need to divide the current workers in groups. I personally always like three, but it’s possible with two or four as well. The three groups are: fantastic, acceptable, under performer. Never just model based on only the best, since you have a risk of a “false positive.” Say you only look at the best and it turns out everybody has a really high extraversion, for example. But you hired on that and all the under performers have that too. You still know nothing about the relevance. It could be relevant, but it could also be your previous hiring bias.

For the third step there are two possible choices. You could start by identifying the traits you expect to be important for the role. Then you measure your current population on this and see if there are major differences. The other choice is to start with the measuring. Look at the traits that significantly differ and decide based on your expertise if it’s a trait that matters for the job.

In this third step there are two important things. This is where most bias goes into the AI. First of all you need to make sure the data you use is unbiased. So if you use the same data you used to select on a resume, that’s by definition biased data. If you move up a level, look at the cognitive and/or psychometrics traits of a person, you have a chance of unbiased data. But don’t let the AI do it on its own. Like Christoph Fellinger said, AI doesn’t know context, so it cannot decide if it is just a correlation or it might actually matter.

The fourth step of the process is testing the algorithm and seeing if the top performers come out on top. This is done on part of the sample that wasn’t used to train the algorithm in the first place.

The fifth and final step is to disagree with the algorithm every now and again. Simply do hire someone who the algorithm tells you not to. Yes it’s a risk, but risk exists now as well with our current hiring practice, the human way. Just feed someone back into the system and see if it works or not.

Enough People

A big problem with this system, the system that’s most valid and the best chance to get the bias out, is that you need enough people in the exact same job to train an algorithm on. According to my sources, it’s at least 50. And I don’t mean 50 in the department. I mean 50 in the job. So 50 recruiters, not 10 sourcers, 10 HR advisors, 10 HR administrators, 10 recruiters, and 10 HR managers. For most organizations this is impossible, so they take shortcuts. They go with standardized profiles, something I’d never recommend because your organization is unique by definition.

Although it is possible to start with a standardized profile, check that with your current employees and work from there. In that case you might be able to do with fewer people in a job, but you do need to experiment more. Because your base is lower, the algorithm should be tested more often, so more of step five.

It’s possible to build a bias-free hiring algorithm, but make sure you do a few things right:

Assume your data is biased
Select on things that matter, and usually that’s nothing in the resume
Make sure the things you select are traits used the job
Test and re-test whatever you build.