Information governance 101: The regulatory compliance survival kit
                
                Data profiling
After outlining how their data moves, organizations should clarify which data applies to which regulations. Data profiling— which involves generating statistical information about datasets—is indispensable for this task. This facet of data discovery addresses the problem of having data coming in from various sources and not knowing what type it is, or how it needs to be protected, Ganesan said. While determining enterprise architecture is imperative for governing data in motion, profiling data is useful for governing data at rest.
The data profiling tenet even extends to service profiling for network access control in IoT systems to learn who is accessing information, where they are from, the type of application or system they are on, what they want to connect to, and whether they do this regularly, Chapman explained. Frequently, data profiling’s objective is to build a data catalog
“that makes it easy to search,” Ganesan said. Profiling outputs provide a rich set of metadata on which to base compliance measures. “It could range from discovering ‘Here we have some text values’ to ‘Here we have some numbers for the size of the sales,’” Polikoff reasoned. “For the number of unique values, it can show where the values are unique and how many unique values there are.”
Classifications
After deconstructing the enterprise architecture and profiling data, organizations should construct a data catalog to understand the specific actions that will fulfill their myriad regulatory obligations. According to Aasman, “You can’t have data governance if you don’t have a catalog of all the data assets that you have.” Organizations frequently populate data catalogs by tagging data according to classifications, which may not always be adequate. Aasman characterized the term “tagging” as extremely primitive. “Really, you want to have a full semantic description of the content of every digital asset.” Such detailed descriptions would not only include potential regulations impacting data but also their applications, databases, and the processes they support.
Machine learning is often employed to automate the classification step and may do so by scrutinizing metadata. Although metadata can inform classifications, it’s more effective to combine such analysis with “looking at the content and mapping that to rules,” Ganesan noted. By combining this rules-based approach with machine learning methods, organizations can accurately populate data catalogs to know what data are applicable to which regulations. They can then leverage this collection of “tags to build controls and operationalize them,” Ganesan said.
                
	
    
                Policy enforcement
By implementing the preceding steps, organizations can take timely action to ensure they comply with regulations. The controls Ganesan referenced for operationalizing compliance measures are broad and include the following:
♦ Obfuscation: The most ubiquitous means of rendering sensitive data anonymous are encryption and tokenization, which offer obvious security benefits. According to Ganesan, randomized encryption obfuscates data “so the numbers one, two, three are always four, five, six. But, the four, five, six is random.” A key advantage of this approach is that organizations can still leverage this data for endeavors such as queries for which they don’t need the actual personal information itself. Other encryption techniques include certificate-based encryption, which verifies that a machine talking to another machine has the authority to do so, Chapman said.
♦ Access controls: One of the advantages of employing centralized platforms for managing regulatory information is using them as a console for administrating access control lists. Competitive offerings can “translate these rules into a native database,” Ganesan said.
♦ Micro-segmentation: Typically relevant at the networking level, micro-segmentation takes two forms. The first Chapman identified as “what devices and what people are on what systems and users.” A best practice is to isolate systems containing sensitive data. The other form of segmentation is useful when there is information coming out of a transaction or interaction and it is necessary to take some of it and send it to one place and send the rest somewhere else. Such routing should be based on regulatory requirements.