Learn how to build a data-driven, knowledge-based enterprise. Register early for KMWorld and save!

Why big volume is not the Big Data story



   Bookmark and Share

Organizations have witnessed an unprecedented surge in information generation in what has come to be seen as the era of Big Data —leaving many at a loss as to how all this information can be managed and used. Indeed, last year Science Daily reported that 90% of the world’s data had been created in just the previous two years. Intuitively, many organizations have come to see the sheer volume of information as the main data management hurdle, but a recent piece of research has pointed to a different challenge— one associated with data ‘findability’.

The age of the Internet has made us accustomed to having all the information we could want readily available at our fingertips – quite literarily so, thanks to laptops, tablets, smartphones and other devices. Digital technology has enabled us to find information effortlessly in our personal lives, and we have become accustomed to being able to find what we want to know—such as news, directions, menus, entertainment listings, contact details, product facts—anyplace, anytime.

Unfortunately, we rarely experience the same level of data accessibility in our workplaces, where internal information assets can be massive and hugely complex—and not at all easy to access search with the pinpoint precision that is usually required to find a very specific document or piece of content. Addressing this challenge should be high on the priorities list of any organization aiming to extract value efficiently from unstructured content. But it is proving to be no easy task.

The rate of information generation represents an extraordinary resource and opportunity for many companies – but only if it can be effectively accessed and the resulting insights employed. The ability to tap into and use unstructured information—such as customer correspondence, market intelligence, internal communications, product information, technical specs, research and development reports, field and case notes, service information and customer feedback—is crucial for companies to gain a competitive advantage.

Underlining the difficulty of making unstructured data accessible are recent findings by MindMetre Research. A survey of nearly 400 senior information management professionals in Europe and the US shows that 85% believe large organizations are creating more unstructured data than ever and 89% see insights from unstructured information as essential to gaining a competitive edge.

What is really interesting about the MindMetre research, however, is exactly why so many companies are struggling to unlock the commercial value of the data they are collecting. Just a third (34%) of the organizations surveyed see sheer volume as the main challenge. The biggest hurdle between companies and the vast business intelligence trapped in Big Content – the unstructured element of Big Data—is the fact that the data they hold is fragmented, dispersed and stored in disparate formats: 71% of the information professionals surveyed says that the content they need is scattered among different business units and stored in different formats. The other big problem is accurate and consistent meta-tagging: 56% note that enterprise information is not labelled with the metadata necessary to make it easy and quick for staff to find.

It would seem, therefore, that vast quantities of information in unstructured content remain largely unreachable and therefore unusable for many businesses. The challenge at hand for these enterprises is being able to pick pieces of ‘small data’ (a particular report, business pitch, email, contact, etc.) out of a vast haystack of unstructured information, making the total sum of corporate knowledge and experience available to all employees at any time.

It is neither affordable nor plausible to manually categorize vast volumes of documents and images—otherwise known as Big Content. Instead, turning this mass of information into a usable source of knowledge and insight requires Content Intelligence— which entails employing applications and tools that radically improve an enterprise’s ability to find and organize information by endowing systems with more effective taxonomy management and semantic search capability.

These automated meta-tagging/mark-up systems need to be able to perform the task of accurately and consistently categorizing and labelling documents and images while leaving them in their original locations, avoiding the expense of formatting and ingesting huge volumes of disparate information into a single hub. What these companies need to improve their performance and productivity, therefore, are systems that are capable of automatically and accurately categorizing and tagging their unstructured data. It is only then that these organizations can harness the power locked in this data.

The core problem for many organizations is that their existing enterprise information management applications – such as Microsoft SharePoint and Fast, Apache Lucene and Solr, Oracle, Google Search Appliance—don’t have the level of Content intelligence built into them needed to really unearth the commercial value in unstructured information. More often than not, the search facilities in these systems aren’t designed to achieve the level of Content Intelligence many organizations need, as these systems have only basic classification and taxonomy management capabilities and often cannot be used to apply metadata across disparate information sources.

What these companies need to improve their performance and productivity are systems that are capable of automatically and accurately categorizing and tagging their unstructured data. There are a number of bolt-on applications that can sit alongside these systems and imbue them with Content Intelligence, facilitating better management of ontologies/taxonomies and automatic classification of information. These tools enable enterprise information platforms to deliver a user experience that addresses searchers’ intent so that extraneous documents can be filtered out and relevant content that might normally fall outside the search parameters can be included.

Dealing with the sheer volumes of data is one issue facing businesses today, but it is not the principal problem—it is making specific pieces of content findable and usable through accurate meta-tagging and categorization that will allow organizations to leverage Big Data to its full potential. The result is that information assets can be used to add value across an organization by eliminating work duplication, boosting efficiency in product and service development, improving strategic planning and risk management, and overall adding insights across a broad range of business functions.


Search KMWorld

Connect