Fixing Bias in AI

It was reported recently that Amazon shut down an AI-powered recruitment system because it discriminated against women. The system was designed to look at a collection of resumes and pick the best candidates, but most of those it picked were men. Apparently, the team building the system decided that it was impossible to stop it from finding ways to discriminate against female candidates. What could have been done to avoid this situation is to recognize the limitations of AI and have a development plan to work within those.

What AI Does

The fundamental concept behind AI is that the software can be trained to recognize patterns in large data sets and make predictions based on those. That’s true whether it’s a driverless car, a camera, a toothbrush, a personal assistant like Siri, a robot to fill orders at an Amazon warehouse, or a product to evaluate resumes. The patterns represent the inputs that are associated with particular interpretations and lead to outputs such as a response to a question or an action performed by a machine. But there’s nothing inherently intelligent about any of this. The most advanced AI product lacks the intelligence of a three year old. It cannot tell right from wrong or form value judgments. AI technologies cannot reason; they will not figure out that steam is formed by heating water.

AI products learn from data to recognize patterns that may be too subtle for a human to readily see. The patterns represent rules that can be applied to detect similar patterns in other datasets. But this is where the limitations of AI restrict what it can do. A product may discover rules from the data but it cannot develop completely new rules on its own. IBM’s system that beat the world’s best chess players cannot, on its own, learn to win at Monopoly or Exploding Kittens. An AI product can apply the rules it learned more efficiently or in combinations not used by humans but it has no way to tell if the outcome or prediction it makes is desirable or not, and consequently cannot look for a better one. Essentially, there is no “I” in AI.

Getting the Training Right

An AI product cannot grow beyond the limits of the data used to train it. By analyzing vast amounts of resumes, labeled or tagged as representative of an outcome such as high or low performance, a recruitment system can learn to recognize which resumes are associated with what outcomes. Amazon’s recruitment system was trained on labelled resumes.

The labels likely contained information about whether a candidate was successful in the job and how long the candidate worked for the company. This is the right approach, but it appears that the training dataset mainly included resumes from tech jobs, where the majority of employees are male, and consequently the majority of resumes were of male candidates, including more high performers. Since the AI was looking for patterns in the data, it found one which showed that more men are high performers than women, when the truth is just that more men applied for these jobs. Consequently, when the AI subsequently evaluated resumes it applied a rule that gave men preference over women.

The biased outcomes produced by the system could have been removed by using a balanced dataset, one that used equal proportions of resumes from male and female candidates. For example, it could be a dataset of 10,000 candidates that included 5,000 male resumes and 5,000 female resumes. Further, the male and female candidates should have had equal proportions of high and low performers. For example, 1,000 each high-performing male and female candidates, and 4,000 each low-performing male and female candidates. Such a dataset would show a pattern where males and females were considered equally likely to succeed. Gender would not be a factor or given any weight in evaluating a resume.

Getting the right outcomes from an AI product also depends on how the product is designed. That is, what patterns is it being trained to look for and what kind of output — or prediction — it is supposed to produce. Products that evaluate resumes are often designed to score a resume by their relevance to a position. The product determines how many skills that appear in a resume match the skills that appear in a job description. Then it rank-orders resumes according to such a match. A better approach estimates a “fit” of resume and job description to attributes of a success profile, such as capabilities and accountabilities. Such an approach finds resumes of candidates who will be successful in a specific position and not just resumes that match skills and experiences.

The Pursuit of Perfection

The goal of AI development is to create context-free AI, or artificial general intelligence. No roadmap exists to get us to AGI, but there’s a lot to be gained from developing AI that handles many tasks which humans find tedious or are simply inefficient at. Evaluating resumes is one such example, which if done correctly, can free up a recruiter to engage with candidates and focus on other tasks that add more value to the recruiting process. Shutting down such an AI project because it didn’t produce the right results sends the wrong message. Most new technologies rarely live up to the hype that surrounds them and disappointments are inevitable. AI is no exception. The right thing to do is to work on improving it. After all, NASA didn’t abandon plans to go to the moon when a fire killed the crew of Apollo 1 and Henry Ford didn’t close the Model-T assembly line because some cars crashed.