Focus On Data Quality - Data Quality Matters
It’s hardly news that every organization has data: structured data, unstructured data, and metadata describing sales, customers, content, and products. Increasingly, companies want to leverage this data for business intelligence and other actionable insights; this is big business. To that end, organizations must first place heavy emphasis on data quality.
Data quality is the process of standardizing data into an acceptable, coherent format and using those standards going forward within an organization using a data governance system.
Every reliable analysis begins with data governance and data quality. All data analysis is very much a Garbage In, Garbage Out (GIGO) prospect—the answers and queries derived will only be as good as the quality of the data. Starting with poor quality data can provide misleading results—or worse.
The idea of data quality comes from computer science, but the principles are true in every industry from medicine to manufacturing—and they work in much the same way. Any organization examining the quality of its data must always begin from a standards-based view because the more bad data that goes in (and the faster the bad data is processed) the harder it is to maintain data quality.
Measuring data quality relies on a combination of factors:
♦ Precision—Is the data accurate? Is the metadata controlled—are names and other entities disambiguated? Is the content and metadata correctly matched to the subject and/or industry?
♦ Completeness—Is the data consistent and comprehensive? Is there extraneous or noisy data? Is there missing information that could be included?
♦ Timeliness—Is the data up to date? Are there issues with legacy data or old formats?
♦ Availability—Can the data be accessed by other applications for analysis? Is it machine readable? Is the data gathered in one repository or spread across a variety of silos?
Data quality is independent of format; whether the data is in tables in an RDBMS (Relational Data Base Management System) or expressed in a graph, the same principles apply.
For organizations looking to adopt machine learning, artificial intelligence, and similar technologies, it is critical to ensure the quality of the data to be used before beginning. This requires several steps:
♦ Data audit—How much data is there, and in what formats? The size of the data set influences how it can be handled and the amount of resources needed to deal with it.
♦ Quality review—Using the standards listed above, what is the current state of the data? Do you have good, “clean” data or does it require significant “cleanup” before use?
♦ Data cleanup—If the data isn’t ready for use, what steps can be taken to improve the precision, completeness, timeliness, and availability of the data?
Equally important to the end quality is the processing of the data. Garbage in, Garbage out is true, but good data can be ruined if not handled correctly. Good data input, as well as professionals who can organize, enrich, and process the data in a manner that creates the most value for the enterprise are vital to the process. The value will differ among industries, as well as the organizations within those industries, but the principles of data quality do not change.
The questions above must be answered before building any vocabularies and before doing any analysis. There is certainly a time and money investment, the value of which must be communicated to stakeholders and decision makers. Good data, good processes, and good organization are the keys to highly usable and reliable data output. Good output is necessary for high quality analysis.
Using data to provide analytics, business intelligence, and other important tools for making the correct strategic decisions is critical to the fast-paced environment of modern business. Measuring the return on investment can be challenging due to the nature of data use and its role in assessment and planning. It affects every facet of business, from the top down (strategic planning, product development, and market focus) to the bottom up (efficiency of throughput, results reporting, and task allocation).
Many companies don’t utilize the full benefits of their data store, and more importantly, lack the awareness of the ways it can be improved. The ones that take steps to improve in this area have an important advantage in the highly competitive world of modern commerce. They know what they know, allowing them to ask the right questions and get the right answers. This results in a well-informed decision-making process and the agility and precision of direction that separates the winners from the also-rans.
Clean data is better data, which leads to better output. Ensuring the data is clean before any analysis improves its value in any organization. With a combination of data auditing, clean-up, and quality review, they can be assured that the analysis will be standards compliant, high quality data. Access Innovations is committed to helping organizations improve the state of their data through data clean-up and metadata enhancement to organize their content and make it more discoverable.
Access Innovations, Inc.
6301 Indian School Road NE Suite 400
Albuquerque, NM 87110