-->

KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

Leveraging Unstructured Information to Enhance Business Intelligence

A product manager is responsible for deciding new product strategy. As input to his decision-making he requires in-depth, up-to-the-minute competitive and market analysis, including information on competitors’ products, partner activities, governmental policies, and emerging scientific and technical advances.

A vice president of customer service is responsible for maintaining high levels of customer satisfaction. He needs to be able to rapidly identify emerging trends and patterns so they can be addressed immediately, analyze his call center volume, and identify causal agents responsible for new trouble tickets.

These two examples highlight the need for competitive and business intelligence applications that can query, analyze and mine large amounts of unstructured data, discover analytical insights into relevant trends, patterns and relationships, and seamlessly provide this information to decision-makers. Traditional intelligence gathering applications are based upon structured, numerical data stored in relational databases or specialized data mining software. Companies today confront the challenge of extracting insights and knowledge from vast amounts of unstructured information and applying them for competitive advantage.

Taxonomies organize unstructured information by providing hierarchically organized topics that can be used to categorize vast amounts of information. Based upon the extent and granularity of the taxonomy, as well as extracted metadata and categorization accuracy, business intelligence applications can leverage the taxonomy to discover hidden semantic and causal relationships between topics, including:

  • new relationships between topics in real time;
  • hidden semantic neighbors and networks between topics; and;
  • emerging and historical patterns, trends and causal relationships.;

Total Taxonomy Lifecycle Management

Developing and managing enterprisewide taxonomies and their associated classification models as critical resources for business intelligence applications involves a five-step process that we designate as Total Taxonomy Lifecycle Management. Each of these phases should involve a combination of automated technologies, advanced analytics, and human oversight.

1. Create Taxonomy. Enterprise taxonomies must accurately reflect the structure of a document corpus together with the company's business objectives. Companies can easily leverage existing forms of organization by importing an XML file of an existing taxonomy or by creating a taxonomy directly from a file system or webserver. More importantly however, advances in taxonomy and categorization software now enable companies to create a taxonomy from any corpus of unorganized documents, by automatically “clustering” documents based on statistical and/or semantic criteria and then organizing these topics hierarchically into a taxonomy.

2. Define Classification Models. Classification models define how a document should be classified into particular topics in the taxonomy. However, no single classification technology can accommodate all the various categorization tasks and business objectives that organizations must enact. From Stratify’s perspective, overcoming this challenge requires an architecture that supports multiple classification technologies, or “classifiers” (for example, statistical, keyword and Boolean), acting in parallel to produce highly accurate document classifications. Developing classification models can often take advantage of the clustering analysis used to create taxonomies. For example, as part of the process the system can automatically create the training sets required to define the statistical classifier, eliminating the manual effort previously required to identify relevant documents for each topic. Companies should always have the flexibility to use these classifiers in any combination to define topics, and create business rules between topic definitions.

3. Test and Assess Classification Accuracy. The accuracy of classification models must be tested and evaluated in real-time without disrupting production environments. Topic definitions should be evaluated against test documents, with insights into how and why documents are classified into various topics as well as the degree of overlap between topics. Based upon the results the classification models can be easily modified.

4. Refine and Optimize Taxonomy. Taxonomies must be continuously refined and optimized to ensure their ongoing relevance to changing corporate objectives and information trends. For example, changing markets, products and governmental regulations must all be accommodated if the organization of information in the taxonomy is to remain relevant to researchers and business intelligence applications. Historically these operations have depended upon significant numbers of subject matter experts, taxonomists, and information scientists to manually review and revise taxonomies, with their scope usually limited to narrowly focused applications. As taxonomies increasingly assume business critical roles within information intensive enterprises, a clear need for automated methods to help refine taxonomies has developed. In response, companies such as Stratify have introduced new functionality enabling customers to automatically:

  • add or recommend new granular topics to the taxonomy, or aggregate sub-optimal topics together;
  • optimize the structure of a taxonomy to more accurately reflect changing business conditions; and;
  • optimize the training sets that are used to define the statistical classifier for specific topics.;

5. Publish Taxonomy and Classify Documents. Categorization software should be able to access and categorize unstructured information of any format located in various repositories throughout the enterprise, as well as on external sites and those requiring user authentication for access. Publishing a taxonomy enables the software to crawl and categorize new documents, making them available to researchers and business intelligence applications.

Our experiences with a wide variety of enterprise customers clearly illustrate the benefits of managing the total taxonomy lifecycle using a combination of automation, advanced analytics and human oversight at every step. Automated techniques accelerate creating, refining and optimizing taxonomies, but ultimately can never be fully cognizant of real-world considerations that we take for granted with human experts. Analytical explanations regarding how and why the software produces specific results are crucial for subject matter experts to tune the taxonomy and classification models appropriately. These information managers are the ultimate arbiters who ensure that taxonomies satisfy our application requirements and corporate objectives. Combining these three factors— automation, analytics and human oversight— into a highly scalable platform with an integrated interface to manage the taxonomy lifecycle is a prerequisite for the development of the next generation of business intelligence applications.

Business Intelligence

Taxonomy and categorization software can provide entirely new business intelligence capabilities previously unavailable from traditional analytical and data mining tools. In addition, while taxonomies offer an intuitive navigation metaphor for browsing complex knowledgebases, new functions for analyzing and mining unstructured information offer a host of additional opportunities for advanced visualization technologies that can be integrated directly into decision-making processes.

Discover new relationships: Confronted by high volumes of changing unstructured information, analysts face the challenge of identifying new and emerging topics. Analysts can leverage the clustering algorithms used in creating and refining taxonomies to immediately identify new topics and update their production taxonomies as needed. The cluster analysis can provide important information regarding the relative strength of specific topics as well as identify core documents and sources responsible for the generation of specific topics.

These processes can be run on a regular basis to proactively inform analysts of changing conditions. For example, based upon a daily analysis of a set of information sources, a product manager can be immediately notified when a key competitor launches a new product, at the same time having instant access to the set of related information from a variety of sources—including press releases, news items, analyst reports, etc. Simultaneously he could receive notification of changes in third-party supplier forecasts, revenue projections and commodity contracts necessary for his product.

Discover semantic neighbors: No single taxonomy structure can capture, model and represent all possible (especially new and emerging) relationships between topics. The ability of taxonomy and categorization software to map new semantic neighbors and networks creates enormous opportunities for business and competitive intelligence applications.

1. The software can discover first-order semantic neighbors, i.e. topics that, while directly related, are not captured in the organization of a given taxonomy. This enables users to identify topics as they become pertinent to their areas of interest. The software can also ascertain the importance of these relationships based upon their semantic strength, enabling users to focus their attention on those key relationships that can contribute substantively to their research interests.

2. The software can map new semantic networks connecting remote topics in the taxonomy. By mining the implicit relationships between topics by virtue of the existence of even single documents, the software can construct semantic maps, uncovering indirect relationships that can be critical for business and competitive intelligence.

For example, while many companies organize Linux software within a technology area of the taxonomy, the software could highlight new relationships emerging between Linux, EU member states, and specific European consortia that could be instrumental when developing new software products for the European market.

Discover historical trends: When the software utilizes a database to manage taxonomies, document categorizations and metadata, it is able to provide business intelligence applications historical data-mining capabilities. It is easy to visualize historical changes to the taxonomy and topic activity, as well as the activity of specific information sources in the context of the taxonomy, providing valuable inputs to analysts. Finally, analysts can interrogate semantic networks in the context of the taxonomy to identify causal agents. The semantic strength between topics can be tracked over time, and changes to the network can be located precisely. Returning to one of our initial examples, if the textual entries that document trouble tickets in the CRM system are categorized within a taxonomy, the system can proactively visualize emerging patterns or repeated problems and identify particular topics or causal relationships that require the vice president’s attention. Companies can focus their response on the cause and not just the symptom being reported, leverage that knowledge within ongoing decision-support systems, and increase customer satisfaction and retention.

Conclusion

Business intelligence applications that can integrate knowledge extracted from unstructured information, which constitutes 85% of corporate data, enable companies to act faster and smarter. Taxonomy and categorization platforms that can organize, classify and leverage high volumes of unstructured information extend the reach of business intelligence applications, enabling them to query, analyze and mine this previously

inaccessible information. Integrating these new capabilities directly into business decision processes will directly benefit companies, providing them critical insights into their environment and enabling them to act with focus and speed.


Stratify, Inc.—Discover More ™

Stratify is the emerging leader in unstructured data management software. The Stratify Discovery System™ is a complete enterprise software platform that helps companies harness today’s vast corporate information overload by automating the process of organizing, classifying and leveraging the business-critical, unstructured information that is usually found in documents, presentations and Web pages.

Named as one of the winners in Computerworld’s Innovative Technology Awards 2002, Stratify is headquartered in Mountain View, CA. For more information about Stratify, please visit Stratify.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues