Are you receiving the most up-to-date KM news? Subscribe to one or more of our newsletters to make sure you are!

The End of Data Bulldozing: File Analysis



   Bookmark and Share

Migrating from one enterprise collaboration system to a new version or even new system can be overwhelming. Unfortunately due the immensity of this project type, organizations tend to take a bulldozer approach to moving their content.

When using a bulldozer, no inherit issues with data are solved – you are simply moving content to a new system.

Rather than dumping content with no strategy or purpose, organizations should take measured steps to strategically migrate information to ensure a reduction in previous issues. These steps include data discovery, data classification and data governance – collectively known as file analysis, can eliminate the need for a “digital bulldozer”.

However, we may sometimes overestimate our strengths and capabilities where things can go wrong. Asking the right mixture of questions allows any organization to plan and execute its migration goal smoothly and to fulfill its expectations.

So what are the questions that everyone has?

  • Where do I start?
  • What is the most important for me?
  • Where is it?
  • How do I secure it?
  • How do I find it in the first place?
  • If it’s not of any value to me, does anyone else depend on it? Is everything I have equally valuable to me?

Data Discovery – What and Where is It?

Data discovery is the process that provides a fingerprint of the current state of the organization’s data or information cycle – unveiling other dependent processes that support the organizational and technical capabilities of the data itself. The goal is to identify relevant documents and their metadata, create business rules, and document all relevant content sources within the infrastructure that surrounds the data.

Too often, organizations move all of their existing data from one system into their new shiny system. Organizations should use this opportunity to improve the quality of their data and apply higher standards to the information that is critical to the business.

Ongoing data discovery on the other hand is no different in terms of actions as it is in an initial data discovery during migration. Usually, it is derived from baseline and the learning curve is applied as an ongoing data discovery process to make sure that the data governance model is effective and measurable.

When conducting data discovery in various content sources, we are effectively building data inventories – allowing organizations to locate relevant master data as well as applications, assign ownership and link the relationships between content sources or data inventories. Content sources like SharePoint, file shares and cloud applications are all data inventories where employees create, read, update and delete data constantly. We often produce more data than we get rid of, and this is one of the major justifications for data discovery.

Why Do We Need to Perform Data Discovery?

A simple data discovery can provide a number of benefits to any organization willing to follow through with one:

  • Begin with a Strategy: Every successful data migration project begins with a strategy.  The strategy should take into account any security considerations, location of data or data sovereignty restrictions and data volume. Data discovery gives insight into the potential hidden constraints such as capacity, bandwidth or project timing.
  • Cost savings: If departments have an IT chargeback on data storage, implementing a data discovery program before migration or on an ongoing basis creates a cleaner, less redundant data volume. This is much easier to maintain, back up, manage, restore or archive data.
  • Identify Risks and Policy Violations: Organizations often identify a large amount of policy violations or unidentified risks across data that is being stored and is no longer accessed or needed by anyone. Once they identify such deviations, the organization can have a baseline for future data discovery. This outcome allows organizations to define a set of indicators – key risk indicators (KRI) and key performance indicators (KPI) – in order to find data silos that are outside of the standard business activity and serve as a baseline for future references or plans – such as maintaining their data, systems, assets, information, money and time.

Business Classification

So, how do we find what is relevant to us and identify all relationships between data, processes and people?

Organizations must determine the definitions and business context associated with business classification, metadata, policies, standards and processes. Defining business classification helps build the business context around critical data. Additionally, it can help define policies that reference data and assign ownership to it.

Business classification – also known as data classification and metadata management – is the result of capturing relevant supporting business and IT context about data in the form of metadata.

Classifying documents helps optimize IT data storage costs and shorten a migration process. Not all data is of equal importance, so a cross-functional team should determine which data should be migrated, in what order it should be migrated, as well as define and apply different policies on how the data should be managed in future.

Data Governance

Once we have discovered our data and created our business classification definitions to understand its real value, the next step is to apply the consolidated rules, policies and procedures to a data governance plan.

We can apply a data governance model in existing systems as part of manual and automated rules, as well as in situations where an organization needs an end-to-end migration from one system to another.  Implementing automated rules allows an organization to enforce data quality, privacy and other business rules that may be defined in the previous stages.

Human-centric review is always a key ingredient to refine the outcome of the data discovery or data governance and deliver acceptable results. Exceptions and decisions requiring high-level review are also handled during manual inspection, which instills extra confidence in areas where an automated approach is not enough.

Organizations can better deliver on the promise of collaboration systems by following these steps to define the data governance strategy, priorities, business case, policies, standards, architecture and the ultimate future vision for the environment. Without proper discovery and planning, critical business functions can become unnecessarily complicated – making it more difficult to grow and adapt to ever-changing business trends.


Search KMWorld

Connect