The Dangers of Data, Part 1

I have been using data to guide decisions for a long time, and I have made a ton of mistakes in analyzing that data that has led to bad results. There’s a reason people say “garbage in, garbage out,” but how do laypeople know when data analysis is useful and when it isn’t?

Most people never know. To compound the issue, we are all presented with more access to data, and more exposure to unsupported analysis of that data presented as truth. To learn how easy it is for folks to be fooled, read this article.

There is neither space or time enough to cover every potential issue with data analysis, so what I will offer here is a brief list of common errors in order to help everyone up their game.

Confusing “Cause” and “Correlation.” I’ve written about this before, but this is, in my experience, the most common error made, and the one most commonly committed by the media as well. If you read the article linked above you will see how data dredging works. To overly simply it, this equates to identifying correlation in data and using that assume causation. It (like any data analysis snafu) is a slippery slope. In my past I worked for an organization where one division had much, much higher turnover than any of the other divisions. Attempts to manage that turnover, and foster retention, were random and desperate. Lo and behold, however, suddenly in the space of just a couple of months, turnover dropped to near zero. The HR leader for that area immediately trumpeted about the latest retention effort and how it turned everything around. The data could certainly demonstrate correlation, but it was premature to announce causation. The HR leader had not taken into account any additional factors, or conducted any comparative analysis to remove all (or as many as possible) variables other than the retention effort. When we looked at all divisions, turnover had dropped suddenly at all divisions and across roles and locations. A survey was launched to ask employees why they were not leaving anymore; were they now happy in their roles? The results confirmed a larger factor impacting turnover: the economy had quickly turned for the worse. There were suddenly very few jobs and innumerable job seekers. According to our employees, they weren’t happy, but it felt forced to wait the downturn out. What would the outcome have been if we had accepted the initial assumption vs. digging deeper and now having time to really solve the issue of retention? Which brings us to:
Be skeptical, always. Analysis is prone to bias, accounting errors, poor assumptions, and manipulation. Unless you have implicit trust of the source, ask questions. What was the sample size? What was the initial hypothesis? How was the data collected? How are variations being defined (i.e. is what you consider significant, what I consider significant)? Are you reporting results that are transferable? If you aren’t sure of the data, do your best to test it yourself. This is core to research: are the results able to be replicated by others? Most results about research you read in TA cannot offer apple-to-apple type comparisons, since the variables are different for each organization (location, cost, supply, demand, etc.). So just because something works for Spectrum Health does not mean it will work for you. There may be a statistical probability it will work, but there is no definitive proof it will, so just doing something that someone else has done and expecting the same result is a fools’ errand.
Summary statistics don’t tell the whole story, so use them carefully and only if you are prepared to share underlying detail. A great example of this is time to fill reported as a singular number for an entire organization, which is generally a useless number. Why? Well unless every position you recruit for is the same, then there will be variation. In order for your organization to truly be prepared for and manage vacancies, they deserve to have a more accurate and specific datum. For example, in our organization the time to fill for a third Shift RN in the NICU is drastically different than a customer service rep in our call center. If we reported a single time to fill number, our nursing leadership would be disappointed when we could not deliver in the time reported, and our customer service team would be appalled at how long it took us to fill positions. Neither accurate assumptions since we supplied a garbage number instead of providing necessary specificity.

Collecting and analyzing data requires patience, caution, and the ability to keep asking questions to drill down to honest answers, even if those answers disprove your thesis or prove contrary to conventional wisdom. I will continue to explore this fascinating topic next month, and look forward to talent-acquisition organizations becoming bastions of analytics!