Evolving data issues challenge RM approaches
Records management (RM) is a well-established discipline supported by mature software technologies. However, the explosion of data is raising new questions about how RM should be handled. A few of the ongoing issues include big data, master data management (MDM) and how to deal with unstructured data and records in unusual formats such as those contained in graph databases (see sidebar following article.
Records are kept for compliance purposes and for their business value and sometimes because no process has been implemented for systematically removing them. “There are growing struggles with the massive volume of big data,” says Kevin Parker, senior enterprise information architect for Neostek, an information management and technology consulting company. “IT and legal have different priorities about what to keep—getting rid of data makes IT nervous, but there are times when records should be dispositioned.”
Data stored in data lakes is largely uncontrolled and typically has not had data hygiene processes applied to it. “Data quality for big data repositories is usually not applied until someone actually wants to use the data,” says Karen Lopez, senior project manager and architect for InfoAdvisors. Quality assurance might include making sure that duplicate records are dealt with appropriately, that inaccurate information is excluded or annotated and that data from multiple sources is being mapped accurately to the destination database or record. In traditional data warehouses, data is typically extracted, transformed and loaded (ETL). With a data lake, data is extracted (or acquired), loaded and then not transformed until required for a specific need (ELT).
MDM is a method for improving data quality by reconciling inconsistencies across multiple data sources to create a single, consistent and comprehensive view of critical business data. The master file is recognized as the best that is available and ideally is used enterprisewide for analytics and decision making. But from an RM perspective, questions arise, such as what would happen if the original source data reached the end of its retention schedule. “Companies have not yet thought about the impact of records management on MDM or vice versa,” Parker says.
As a practical matter, a record is information that is used to make a business decision, and it can be either an original set of data or a derivative record based on master data. “The record is a snapshot that becomes an unalterable document and is stored in a system,” says Seth Earley, CEO of Earley Information Science, a company that leverages information architecture to enable digital business. “Even if the original information is destroyed or transformed, the record lives on as a captured image or artifact.” Therefore the “golden record” that constitutes the best and most accurate information can become a persistent piece of data within an RM system.
Unstructured data challenge
A large percentage of records management efforts are oriented toward being ready for e-discovery, according to Rob Karel, VP of strategy and marketing for information quality solutions at Informatica. “This is much more of a problem in the case of unstructured data than for MDM,” Karel says. “MDM has gone well beyond the narrow structure of relational databases and is entering the realm of big data, but its roots are still in the world of structured databases with well-defined metadata classifications, which makes RM for such records a more straightforward process.” Informatica has traditionally focused on data management, and its platform includes modules for data integration, MDM, big data, cloud integration and data quality.
“The challenge with unstructured data is to build out the semantics so that the content management or RM and the data management components can work together,” Karel explains. “In the case of a contract, for example, the document might have many pieces of master data. It contains transactional data with certain values, such as product or customer information, and a specialist data steward or data librarian might be needed to tag and classify what data values are represented within that contract. With both the content and the data classified using a consistent semantic, it would be much simpler bringing intelligent parsing into the picture to bridge the gap between unstructured and structured data.” Auto-classification of records can assist, although human intervention remains an essential element.
Redundant, obsolete and trivial information, the so-called “ROT,” constitutes a large portion of stored information in many organizations—up to 80 percent, according to Mark Diamond, CEO of Contoural. Contoural provides information governance services; it specializes in proactive records and information management, litigation readiness and control of privacy and other sensitive information. “The information generated by organizations needs to be under control,” says Diamond, “whether it consists of official records or non-record documents with business value. Otherwise, it will accumulate and become completely unmanageable. On the other hand, if organizations aggressively delete documents, they run the risk of employees creating underground archives of information they don’t want to relinquish, which can pose significant risks. Companies need to approach this with a smart strategy.”
The system should follow a “five-second rule,” Diamond advises, allowing employees to easily save documents using built-in classification instead of a lot of manual tagging. “The key is to make the system intuitive enough for any employee to use with just a few seconds of time and a few clicks of the mouse,” he explains. “In addition, the value of good records management needs to be communicated and the value ‘sold,’ so employees understand that it can actually help them with their work rather than being a burden.” A well-designed system hides the complexity from users and puts it in the back end. “It is harder to set up this type of system initially, but as more information is created, the importance of managing it also increases, in order to reduce costs and risk,” Diamond says.