Information is everywhere – and we use it all of the time to make most of our decisions. Before you left the house today, you probably checked the time, the weather and maybe the traffic. And that’s just the start of your day. Data is definitely a principle component of our lives.
Last week we talked about the overall impact of data and information. This week I want to focus on data Quality – how to define it and how to improve it.
Can You Trust the Data?
Data may be all around us – but can you trust it?
We all seem to have some skepticism when it comes to data – regardless of the source – and it seems to be getting worse. The term Fake News didn’t even exist a few years ago and now it is a common call to not trust information.
Our skepticism doesn’t keep us away though. We still consume data all of the time – and likely always will.
You’d think at least the data that you have in your own business systems would be trustworthy. For many teams, their data comes from internally driven sources – so it should be good, right? Not necessarily.
The “KPMG 2016 Global CEO Outlook” study found that 84% of CEOs are concerned about the quality of the data that they’re basing their decisions on. There are a variety of reasons for the concerns. But the bottom line is this – they don’t trust the data.
When there isn’t trust in the information offered for business decisions, you need to look at the quality of the data. This is crucial because the cost of bad data can be huge.
Gartner research found that the average financial impact of poor data quality is $9.7 million per year. That’s not hard to imagine given how quickly data can propagate through reports/dashboards/websites. In fact, one bad data point can be pushed out long before anyone even knows that there is an error in it – impacting decisions, work products and so on.
Can You Define the Data?
Before you can address the problem of data quality, let’s first get on the same page regarding data and information. For the purpose of this article, I’ll refer to Data as the raw input to an Information product. On its own, data doesn’t have a lot of use.
A common comparison for data is crude oil. When it’s in a raw state, oil isn’t very useful. Once it’s refined into different products (like fuel, kerosene, etc.), it has consumable value. The same is true for data. As a single element, it isn’t very helpful in decision making. But when it is refined through quality audits and combined with other data elements and analytics, it can become valuable Information.
Since is the goal is Useful (and Trustworthy) Information, you need to first address the quality of the raw data. But before you get started, make sure you can clearly articulate your business case. This is a critical – and commonly overlooked – step. Defining what constitutes usable Data – much less it’s quality – can be hard without the framework of the business case.
Can You Fix the Data?
Data quality depends on the needs of the organization, the audience who is going to use the resulting information and the purpose of that information. So with your business case in hand, you’re ready to review the quality of your data – and then determine what needs to be done to clean it up.
There are four general questions to ask that will help you to assess the quality of your data:
Is it Complete? – Based on your business case, do you have all of the different types of data that you need? Of the data types that you have – is the data consistently populated or are there holes? Incomplete data can easily skew analysis – so completeness is key.
Is it Accurate? – Accuracy gets a little trickier than a simple Do I Have the Data or Not evaluation. First, you need to consider age. How old is your data? Your data could have easily aged out of accuracy.
Next ask, what is the context of how it will be used? For example, you may have four different phone numbers for a customer. All of those phone numbers are true for that person. However, the best one to reach them at may actually the last one on the list. So if the use of the data encompasses history – all of the phone numbers are correct. If the use is contacting the customer, the last one is correct. Make sure your data is true for the use.
Is it Standardized? – Data frequently needs to be compared when it is used, so non-standard inputs can quickly throw your quality. Using the phone number example, do all of the numbers contain Area Codes? Do they contain Country Codes? Do they have special characters in the formatting? Your data may be complete and accurate, but if it isn’t standard, the results of analysis can really be skewed.
Is it Credible? – Do you trust the source of your data? It doesn’t matter if the data came from your staff, your business systems or an external organization. Sources must be trusted for the resulting information to be trusted.
Garbage In, Garbage Out
Quality data is a critical element in finding the holy grail of Useful, Trustworthy Information. To get to quality, you need to take consistent, concerted efforts to clean the data that you have – and ensure that any new data coming in meets your quality standards.
Next week, we’ll look at taking that business case and your new quality data out for a spin to create usable information.
Anne Hale is the Director of Client Services at HL Group, Inc., a premier provider of mobile inventory management and warehouse solutions. She manages our client engagements, works with Wes Haubein on sales and marketing and is unusually preoccupied with good data.