-->

KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

The Future: Where Big Data and Information Governance Meet?

For nearly two decades, large enterprises have complained about how much information they have to store, manage and protect. As regulations regarding that information steadily grow in number and complexity, they complain louder. The stakes—penalties and reputational damage for mishandled information—get higher. But a solution always seems just out of reach.

Fifteen years ago, there were no CIOs. We had VPs of MIS (management information services). If you’d asked one of them whether the problems of data growth, complexity and security—what we now call information governance—would be solved by 2014, the answer likely would have been “yes.” A good bet, but wrong.

As someone wise once said, “Forecasting is the art of saying what will happen, and then explaining why it didn’t.” But instead of trying to explain why we haven’t solved the information governance problem, let’s look at what the future of information governance might hold.

Big Data and Beyond

It seems very likely that big data will play a crucial role in the future of information governance. Ditto regulatory complexity.

Right now, big data occupies center stage in the minds of enterprise CIOs, CTOs and business strategists. Aligned with big data, new concepts have emerged such as the “data lake,” which can store practically unlimited amounts of data in any format, schema and type.

Relatively inexpensive and massively scalable, a data lake enables data to be analyzed without being moved. It may also include connectors for content from legacy and production applications to maintain those applications until end of life. This ensures an efficient transition to the data lake.

A data lake—at least in theory—could hold all of an organization’s data. Does that make it an archive? We’ll come back to that.

As for regulatory complexity, two examples underscore its unpredictable relationship to the future of information governance.

In 2006, the federal rules of civil procedure (FRCP) were amended to broaden the obligation to preserve potentially relevant data, particularly electronically stored information. But the changes were very broad and, rather than risk sanctions, corporations reacted by saving everything. Discovery time and costs exploded. Now, new amendments proposed in 2013 will try to limit the scope of discovery. Will it ease the information governance burden? Or will there be countervailing unintended consequences?

A law passed in Massachusetts in 2010 (Massachusetts General Law Chapter 93H and regulations 201 CMR 17.00) requires that companies or persons who store or use personal information about a Massachusetts resident develop a written, regularly audited plan to protect that information. Compliance and enforcement have no geographic boundaries. Out-of-state businesses that hold personal information about state residents must comply or face penalties.

The truth is, for global enterprises there are literally thousands of regulations that impact the management of information—and thousands of changes to those regulations every year. In every country there are separate national, regional/state and local regulations—as well as a multitude of regulatory agencies—which must be considered when developing policies and an overall information governance strategy. In financial services, for example, one industry source has noted that executives spend the equivalent of one day a week dealing with changing global and local regulatory requirements.

Today more than ever, organizations must balance information value with information risk. To do that, they must know what they have. That is the starting point for applying governance and security to information. Returning to the topic of data lakes, Edd Dumbill blogs on Forbes (The Data Lake Dream) that data lakes have four stages of maturity. It’s only in the fourth stage: “data lake and application cloud”—a stage very few organizations have reached—where the data lake adds security and governance layers. Data lakes—like virtually all big data projects—facilitate data analysis and business intelligence, not information governance. So, to answer the question posed above: No, a data lake is not an archive, no matter how much data it contains.

The Archive Implication

By comparison with big data, archiving seems emblematic of “old” IT. But archiving and data lifecycle management (DLM) have evolved from a storage focus to a focus on business value and data loss prevention. DLM recognizes that as data gets older, its value diminishes, but it never becomes worthless. And it can have negative impacts (unnecessary storage costs, litigation, regulatory sanctions) if not retained or deleted when it should be.

An archive meets a set of objectives formed by information governance policies. These objectives include:

  • Ingesting and retain all information types, structured or unstructured;
  • Auditing and preserving data and content to meet a variety of regulatory and governance mandates;
  • Storing information in an open, industry-standard format, such as XML;
  • Requiring no dependence on originating applications for managing or referencing information;
  • Maintaining a clear, defensible chain of custody;
  • Delivering records and retention capabilities with audit trails; and
  • Preserving information in an immutable form.

All evidence points to data volume, velocity and variety continually increasing. Even if storage were free, which it’s not, saving everything is unsustainable. Companies want to mine their data for operational and competitive advantage. They also want to reduce costs and limit risk. Regulatory complexity shows no signs of diminishing. Archiving may well become the future connection between the ambitions of big data and the imperatives of information governance.

What will the future of information hold 15 years from now? Lots more data. And, we hope, well-managed archives.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues