-->

KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

Information Analysis and the Content Supply Chain

Access to content in many different formats has been growing in importance as companies and government institutions attempt to get a better handle on the information at their disposal. A significant portion of IT investments in the last two years has gone into applications that extend beyond more traditional CRM, ERP and database applications to reach toward a new class of content management, document management, collaboration and search and categorization solutions.

This push really has to do with managing what we might call the “information supply chain.” In so doing, information (input) comes from all potential sources, as information can lie anywhere and must yield knowledge (output) that is actionable and comprehensive. To achieve this, the knowledge output must be accessible in timely fashion, and, more importantly, must be accurate or accompanied with a relevance measure that allows the decision-maker to assess its potential usefulness or credibility.

This brings us to what we believe to be a basic tenet of knowledge management, namely that the primary difference between information and knowledge is relevance. This, above all else, leads organizations to require an information workflow solution.

Traditionally, the chain of information management has been broken down along the lines defined by the three prevailing technologies in the chain: portals, information analysis systems and search engines. Portals have served to accelerate the transformation of information into business intelligence because they act as an intervening lens between search engines that locate macro data and information analysis tools that parse micro data. But, more and more, the lines between these three key application areas have become blurred. Today, streamlining the information supply chain is no longer a mere question of connecting to the numerous available repositories, and then making them available through the sequential filtering of search, portal and information analysis tools.

Portals include search capabilities and some of the typical information analysis tools that revolve around gathering and organizing data. They also include tools that assist with dissemination, e-learning, publishing and data lifecycle management. Search and categorization tools are also growing in functionality, not only taking on the brute force task of retrieving indexed data, but also of helping to pre-organize that data in such a way that the most relevant results are presented in context of the immediate information analysis needs of the person sifting through the data. Because of this functional consolidation, as well as organizational restructuring and consolidation, the path to actionable knowledge—that is expected to be more readily available and more readily quantifiable from an ROI perspective—lies in an integrated approach to managing the information supply chain that we now coin as the “Information Analysis Portal.”

Information analysis portals are growing in importance and will likely be ubiquitous as organizations move more rapidly to better manage their information. The urgency to more efficiently and expertly process stores of information is being felt more and more. Businesses are seeking more effective ways of defining and developing significant product differentiators. Government agencies are seeking to more effectively and more reliably assess national security threats, while affording citizens greater access and control over information to help them avoid heavy and lengthy bureaucratic systems when seeking answers to questions. This is evidenced in the desire of organizations to consolidate their search capabilities around a single platform that affords the most control over several key aspects of the information supply chain as much as possible. The chain has grown in importance and complexity, and now encompasses many varied features such as:

  • Connectors for accessing numerous data formats and media;

  • Annotation tools for adding metadata, and linking data sets;

  • Traditional searching capabilities for accessing this content;

  • Profiling and agent-based capabilities to track key changes to information stores;

  • Categorization and classification of data to best manage very large lists of relevant data;

  • Data translation for handling multiple encodings and languages;

  • Natural language processing to better handle a wider range of users and queries;

  • Entity extraction as a basis for link and temporal analysis;

  • Entity relationship extraction;

  • Summarization of data to minimize the need to parse whole data sets;

  • Better integrated processing of structured and full-text data stores to reconcile traditional data mining from databases and conceptual extraction from full-text sources;

  • Geospatial analysis;

  • Collaboration for sharing pertinent data sets;

  • Expert location and communities-of-interest; and

  • Reporting tools that serve to better enhance functionality of analysis tools.

These can be grouped into overarching categories that represent the key processing blocks of the information supply chain.

Getting at the data.

Data Capture—the first knowledge management process known as gathering

This includes the connectors and the annotation tools mentioned above. Tapping into all forms of file formats and media types is crucial, but it must be possible in multiple languages as well. In addition, more tools and functionality in widespread applications such office suites allow for annotation of documents, which are marked-up with key metadata and even reference other documents for further reading or relevant links. These should be exploitable by the data access bridges that connect to all the various data sets.

Preparing the data for better analysis.

Data Normalization and Transformation—the second knowledge management process known as organizing

This involves the ability to index content with linguistic markers such as language tags, conceptual identifiers, grammatical categories (i.e., determining if a term is a noun, a verb, etc.) that allow for more accurate retrieval of results. It also involves the ability to extract relevant entities, categorize content for subsequent classification and compare against typical data profiles that trigger events that serve to update users about changes in data. Of course, it means being able to do all of this regardless of the encoding, the language or the quality of the media type (as in the case of audio or video, for example).

Applying more sophisticated algorithms to extract valuable links in data.

Data Analysis—the third knowledge management process known as refining

Data analysis revolves around natural language processing of queries so as to be able to intelligently parse out a user’s request even when written in so-called “natural language” (for example, a common sentence rather than a complex Boolean query). It also involves the ability to dynamically classify search results along several dimensions, each dimension being a branch of a taxonomy (or taxonomies). As an example, one could cross-reference information across a geography branch and an infectious diseases branch if looking for geographic distribution of a specific set of diseases. Additional analysis capabilities involve link, geospatial and temporal analysis, which depend on, among other things, good entity relationship mapping and usage of such maps in conjunction with anaphora analysis, syntactic analysis and categorization algorithms.

Presenting the results to the user.

Data Reporting—the fourth knowledge management process known as disseminating

This fourth layer is where the end-user actually sees the fruits of the combined feature-functions being applied and where the information analysis portal’s key value is best showcased. It involves the displaying of information extracted through link, geospatial and temporal analysis via overlays, maps and imagery. This is where collaboration is significantly bolstered by the addition of interfaces with other users through on-line tools serving to better share information, whiteboard, co-produce and form communities of common interest. Within this context, all members benefit from profiling and agent-based alerts that have been set up by individuals according to their unique expertise. It is also where reporting and dissemination can take on their full value through such tasks as the creation of reports and analysis that are delivered in a timely fashion to a customer, a sales representative or a government analyst.

The Bottom Line

Effective information analysis depends on comprehensive management of all aspects of the information or content supply chain. The most effective way to accomplish this is through the deployment of an information analysis portal based on a fully integrated information discovery and analysis platform. This portal should typically include a good portion of the aforementioned capabilities and allow for custom components to be added to the mix in order to ensure that all required functionality is available. This will help enable companies and government agencies alike to address all key processes of information analysis and shorten the path to actionable content and profitable content management solutions.


Alkis Papadopoullos, Director of Linguistic Technologies at Convera, directs the evolution of Convera’s language analysis, taxonomy development and discovery products—key components in Convera’s RetrievalWare 8 information discovery and analysis platform. He has a Masters degree in Physics, speaks five languages fluently and has worked in computational linguistics software development for ten years. Convera is a leading provider of enterprise search and categorization solutions. RetrievalWare 8, Convera’s information discovery and analysis platform, enables fast, personalized searches resulting in exceptionally high productivity and returns on investment assets. For over 20 years, Convera has provided sophisticated search and categorization solutions to over 900 customers in 33 countries to power a broad range of mission-critical applications. For more information please visit Convera

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues