Cleaning Up Recruiting’s Dirty Data

Talent acquisition is littered with poor data. But there are ways to gather and streamline it to make it more accurate and actionable.

Article main image
May 9, 2023
This article is part of a series called ERE Recruiting Conference Spring 2023.

Recruiting is littered with the dirtiest data in HR, says Zach Frank. There’s confusion about where to find specific data, which metrics matter most, and how to leverage talent analytics in actionable ways — all which can hinder your hiring process.

At the ERE Recruiting Conference, May 22-24, in San Diego (and online) Frank, who is senior manager of people analytics at the Freeman Company, will be going into depth about the problem of dirty data in recruiting and what companies can do to clean it up. During his presentation, “Order Out of Chaos: Wrangling the Right Talent Intelligence to Streamline Your Recruiting Process,” he’ll explain how to

  • Determine which metrics you need by first identifying outcomes that matter most
  • Organize and streamline talent data in ways that lead to better decision-making
  • Calculate numbers and develop formulas for greater ROI and efficiency
  • Apply talent analytics throughout the hiring process to uncover and address key challenges

In the meantime, Frank sat down with ERE to discuss the topic of his talk.

ERE: What makes data dirty? And why is recruiting rife with dirty data?

Zach Frank: A good definition of dirty data is information that’s hard to use. And there’s a lot of that in recruiting. You’ve got missing records, mismatched records, duplication issues, input fields with no data or wrong data. There are so many places in the hiring process where information comes in and gets muddied.

The problem is that recruiting inherits many of the same issues that HR data in general has. Often, this has to do with software limitations related to different technologies not talking well with each other. Also, the data becomes complex to organize because there are so many parts of the hiring process where information comes in. You’ve got job req records, candidate records, application records, background checks, assessments, etc. These also splinter off into other records.

Plus, you can have issues with tying information. I remember one situation where there was no tie of information between when someone was an applicant and then an employee. It was because different parts of the hiring process entailed using different vendors, and so someone’s ID with one vendor was not correlated to that person’s ID with another vendor. This ultimately meant that you could not look at which applicants had successful tenures.

Is the main problem the data itself or how the data is organized?

It’s often both, but the quality itself gets compromised because of the lack of organization.

What can companies do to clean up the dirt?

One of the best things to do is to use controlled inputs wherever you can. Instead of having blank fields where people — candidates, hiring managers, and others — can enter information, try to use dropdown menus, for example. That sort of controlled selection can go a long way toward enabling better analytics.

That seems like a fairly doable tweak — thought I suppose you still run a risk that people will choose the wrong item from the dropdown.

That’s true. You’ll never be able to create a state where your data will be perfect, but at least you’ll be able to mitigate issues. At the same time, whoever is doing data engineering at the company can take time to map out possible wrong answers. Yes, it’s a semi-manual process, but creating such a reference map can also help improve the data.

Where in the hiring process do data problems tend to creep in most?

During position creation and creating the req. Issues often happen before a company even posts a job. For example, if new positions are created for every req — you’ve now have a position issue, job issue, and a data issue. For instance, if you really care about manager experience most and team experience in terms of when your people are getting in, your time to fill or time to hire isn’t just from posting or approval — it goes back to backfilling position — when did that person live — but if you create a new position each time, it makes it hard to see. That’s dependent on how individual orgs do it.

How might the data issues you’re describing impact hiring?

Probably the worst possible impact is that you think you have an answer, but it is wrong — only you don’t know that it’s wrong. That is, you don’t realize that the data is, in fact, dirty. You think it’s quality information, but it might actually be based on incomplete or poorly obtained data. Or maybe a formula you’re doing is wrong and no one knows it. So, for example, you might end up with a finding that says that your time to source is 10 days, but it’s actually 15 days.

How do you fix a problem that you don’t know you have?

By auditing your process. Having experienced and dedicated analytics staff is helpful because they can spot problems and know which questions to ask, especially early on.

Unless, of course, you are a smaller org.

Yes, if you are not not big enough to bring on that kind of staff, it is an unfortunate reality that you might develop issues that would be hard to spot.

Want more insights from Zach Frank? Come see his presentation, “Order Out of Chaos: Wrangling the Right Talent Intelligence to Streamline Your Recruiting Process,” at the ERE Recruiting Conference. Learn more and register here.

This article is part of a series called ERE Recruiting Conference Spring 2023.
Get articles like this
in your inbox
Subscribe to our mailing list and get interesting articles about talent acquisition emailed weekly!