February 19, 2016
By Esad Ismailov Compliance Technical Specialist, AvePoint
ViewPoints

The End of Data Bulldozing: File Analysis

Migrating from one enterprise collaboration system to a new version or even new system can be overwhelming. Unfortunately due the immensity of this project type, organizations tend to take a bulldozer approach to moving their content.

When using a bulldozer, no inherit issues with data are solved – you are simply moving content to a new system.

Rather than dumping content with no strategy or purpose, organizations should take measured steps to strategically migrate information to ensure a reduction in previous issues. These steps include data discovery, data classification and data governance – collectively known as file analysis, can eliminate the need for a “digital bulldozer”.

However, we may sometimes overestimate our strengths and capabilities where things can go wrong. Asking the right mixture of questions allows any organization to plan and execute its migration goal smoothly and to fulfill its expectations.

So what are the questions that everyone has?

Where do I start?
What is the most important for me?
Where is it?
How do I secure it?
How do I find it in the first place?
If it’s not of any value to me, does anyone else depend on it? Is everything I have equally valuable to me?

Data Discovery – What and Where is It?

Data discovery is the process that provides a fingerprint of the current state of the organization’s data or information cycle – unveiling other dependent processes that support the organizational and technical capabilities of the data itself. The goal is to identify relevant documents and their metadata, create business rules, and document all relevant content sources within the infrastructure that surrounds the data.

Too often, organizations move all of their existing data from one system into their new shiny system. Organizations should use this opportunity to improve the quality of their data and apply higher standards to the information that is critical to the business.

Ongoing data discovery on the other hand is no different in terms of actions as it is in an initial data discovery during migration. Usually, it is derived from baseline and the learning curve is applied as an ongoing data discovery process to make sure that the data governance model is effective and measurable.

When conducting data discovery in various content sources, we are effectively building data inventories – allowing organizations to locate relevant master data as well as applications, assign ownership and link the relationships between content sources or data inventories. Content sources like SharePoint, file shares and cloud applications are all data inventories where employees create, read, update and delete data constantly. We often produce more data than we get rid of, and this is one of the major justifications for data discovery.

Why Do We Need to Perform Data Discovery?

A simple data discovery can provide a number of benefits to any organization willing to follow through with one:

Begin with a Strategy: Every successful data migration project begins with a strategy. The strategy should take into account any security considerations, location of data or data sovereignty restrictions and data volume. Data discovery gives insight into the potential hidden constraints such as capacity, bandwidth or project timing.
Cost savings: If departments have an IT chargeback on data storage, implementing a data discovery program before migration or on an ongoing basis creates a cleaner, less redundant data volume. This is much easier to maintain, back up, manage, restore or archive data.
Identify Risks and Policy Violations: Organizations often identify a large amount of policy violations or unidentified risks across data that is being stored and is no longer accessed or needed by anyone. Once they identify such deviations, the organization can have a baseline for future data discovery. This outcome allows organizations to define a set of indicators – key risk indicators (KRI) and key performance indicators (KPI) – in order to find data silos that are outside of the standard business activity and serve as a baseline for future references or plans – such as maintaining their data, systems, assets, information, money and time.

Business Classification

So, how do we find what is relevant to us and identify all relationships between data, processes and people?

Organizations must determine the definitions and business context associated with business classification, metadata, policies, standards and processes. Defining business classification helps build the business context around critical data. Additionally, it can help define policies that reference data and assign ownership to it.

Business classification – also known as data classification and metadata management – is the result of capturing relevant supporting business and IT context about data in the form of metadata.

Classifying documents helps optimize IT data storage costs and shorten a migration process. Not all data is of equal importance, so a cross-functional team should determine which data should be migrated, in what order it should be migrated, as well as define and apply different policies on how the data should be managed in future.

Data Governance

Once we have discovered our data and created our business classification definitions to understand its real value, the next step is to apply the consolidated rules, policies and procedures to a data governance plan.

We can apply a data governance model in existing systems as part of manual and automated rules, as well as in situations where an organization needs an end-to-end migration from one system to another. Implementing automated rules allows an organization to enforce data quality, privacy and other business rules that may be defined in the previous stages.

Human-centric review is always a key ingredient to refine the outcome of the data discovery or data governance and deliver acceptable results. Exceptions and decisions requiring high-level review are also handled during manual inspection, which instills extra confidence in areas where an automated approach is not enough.

Organizations can better deliver on the promise of collaboration systems by following these steps to define the data governance strategy, priorities, business case, policies, standards, architecture and the ultimate future vision for the environment. Without proper discovery and planning, critical business functions can become unnecessarily complicated – making it more difficult to grow and adapt to ever-changing business trends.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Super Early Bird Pricing for KMWorld 2026 Available for a Limited Time!
Register NOW for November 16-19. Use code SUPERSAVINGS.

The End of Data Bulldozing: File Analysis

Data Discovery – What and Where is It?

Why Do We Need to Perform Data Discovery?

Business Classification

Data Governance

Mining Business Knowledge From Unstructured Data

Checklist Report - Preparing for Agentic AI: KM Playbook

2026 State of KM & AI Report

More

Agentic AI at the Core: Building Faster, Smarter Search Experiences

Knowledge at Your Fingertips: Building Workflows with Embedded Intelligence

GenAI Without Limits: Harnessing KM for Accuracy, Trust, and Scale

The AI Knowledge Maturity Model: Assessing Readiness and Measuring Progress

More Webinars

Super Early Bird Pricing for KMWorld 2026 Available for a Limited Time!Register NOW for November 16-19. Use code SUPERSAVINGS.

The End of Data Bulldozing: File Analysis

Data Discovery – What and Where is It?

Why Do We Need to Perform Data Discovery?

Business Classification

Data Governance

Super Early Bird Pricing for KMWorld 2026 Available for a Limited Time!
Register NOW for November 16-19. Use code SUPERSAVINGS.