-->

KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

Maximizing the Value of Enterprise Information
By Leveraging Best of Breed Search and Categorization Software

The Holy Grail of knowledge management is that all individuals and communities of interest can gain instantaneous access to highly relevant information, whenever and wherever needed, to make better decisions, reduce risk, eliminate redundancy, improve efficiency, save time, improve financial results and improve security.

Ironically, ceaseless efforts to develop new technologies have only increased the information management challenge. Enterprise Content Management, Document Management, Digital Asset Management, Web Content Management, Customer Relationship Management, Sales Force Automation, Supply Chain Management, Enterprise Resource Planning, Enterprise Application Integration, Enterprise Content Integration, Groupware and Collaboration—are all examples of applications designed to improve the efficiency of organizational activities and functions.

How can all authorized users seamlessly connect to, access and expose all the underlying information and tacit knowledge for individual use or to collaborate across time, space and media?

Information from the Web and other sources is doubling every year. Knowledge workers and researchers generate information at a pace that rapidly outstrips the ability to manage, understand and efficiently leverage this intellectual capital. A recent analyst survey estimated that 25% of respondents' time is spent looking for information—as much as 500 hours per year. Frustrated, knowledge workers give up and re-create the desired information.1

Much has been said and done to realize the goals of enterprise content integration (ECI). Recent acquisitions and partnerships are intended to improve integration levels and thus streamline access to enterprise content. Enterprise portals and intranets have been deployed to create unified, personalized access points to disparate enterprise applications and the information confined within them.

Even so, the continuous evolution of enterprise information systems means that developing, implementing and maintaining large scale application integrations is an expensive and potentially risky undertaking, especially considering the large number of such applications and repositories in use by most medium and large enterprises. Does a more compelling alternative exist?

Turning Enterprise Information into Knowledge

The various enterprise applications and content repositories have been designed and implemented with specific user needs and business functions in mind. In fact, it is just this type of business process-specific focus that has helped to make these systems valuable and indispensable. However, this narrow focus has also created information silos that are difficult to integrate and harness. Data formats are unique and proprietary, security protocols and interfaces are non-standard, search and information access and analysis tools are application-specific and not extensible.

An alternative to expensive application- and content-integration projects is a best-of-breed enterprise search and categorization platform. Such a platform should provide information access and security infrastructure along with services such as categorization and personalized searching. It should provide analysis tools that can be applied uniformly to all targeted asset repositories and application data. The information access and discovery tools must harness the information residing in disparate enterprise applications and information repositories.

Access. Valuable information can exist in hundreds of different formats, file types and media types. It is imperative that the maximum possible number of information and media types can be accessed in an integrated fashion, including structured and unstructured data, multimedia content such as video, audio, images, and all related metadata that adds value and context to the information.

An enterprise-class search and knowledge discovery platform will provide out-of-the-box support for all major applications and repositories, such as groupware systems (Lotus Notes and Microsoft Exchange), document and content management systems (Documentum and FileNet), relational databases (Microsoft SQLServer, Oracle, Sybase) and UNIX- and Windows-based file systems. The framework should also include a developers' toolkit for rapid development and deployment of new repository connectors that may arise.

Security. The inherent value of content is closely correlated to the security protocols that protect it. Authentication and access permissions must be observed and maintained. An enterprise search platform must process and honor access control rights. Any viable platform must integrate its own data access and processing protocols to ensure users access only the information they are authorized to see, thus leveraging the unique authorization and access permissions associated with each application or repository.

Linguistic Support. Once information has been accessed in a secure manner, the next major challenge is to ensure users can find what they're looking for. That presumes that users know what they're looking for. This problem has been solved for some time for librarians and expert searchers who are highly proficient with keyword and Boolean search methods. However, as search capabilities have rolled out to the masses, users with vastly different search skills and knowledge perspectives are trying to find information. Plus, increasing globalization means that valuable information now exists in many different languages, in repositories scattered around the globe.

Extensive language support and sophisticated linguistic analysis is required. Language support can be monolingual (enabling information retrieval in a specified language) or cross-lingual (meaning that an English language query can return relevant results in other languages). For large multinational corporations or intelligence and law enforcement, this represents a unique breakthrough.

Advanced linguistic analysis has been instrumental to the democratization of search, allowing knowledge workers (who are typically not expert searchers) to find relevant information even when they are not exactly sure what they're looking for. Known as concept searching, semantic networks and domain specific dictionaries allow searchers to expand the scope of their searches to include "concepts" rather than simple terms and Boolean expressions. Concept search strives to identify and return units of meaning that are related, but not exact matches to the submitted search term(s). Through verticalization (i.e., use across specific industries, domains and topics) concept search can significantly improve both precision and recall.

Organization & Structure

Perhaps the single greatest challenge is sheer information overload. Even very sophisticated, highly optimized search solutions cannot consistently reduce information clutter to the degree that knowledge workers can be highly efficient and confident in their ability to rapidly find relevant information. In today's time-compressed, high-velocity environment, most searchers simply cannot review scores of returned search results and manually cull them for useful knowledge.

Leading industry analysts and solution providers focus on taxonomies that can bring a consistent and predictable sense of structure. For example, a geographic taxonomy is a hierarchical, general-to-specific representation, such as: world > continent > region > nation > state > city. Employing such a taxonomy against all of an organization's information repositories allows the search system to automatically identify documents with references to the taxonomy nodes, thus allowing the information to be organized and analyzed (in this case) from a geographic perspective. The application of additional taxonomies, such as MeSH2 , GO3 or DTIC®4 and others, can organize information that is relevant to an organization's primary areas of interest and operation.

Many standard or industry-accepted taxonomies are readily available. Applying these enables a new realm of information categorization that can be either industry- or domain-specific, or general and horizontal. Taxonomies provide well-defined, stable organizational frameworks that cut across disparate data sets and functional areas, adding structure and the ability to find information that would otherwise be difficult to recognize. Once categorized, information can be populated into browsable folders or classifications allowing users to intuitively navigate to relevant concentrations of information. These classifications can mirror the taxonomy hierarchy used to categorize the information, or be constructed and populated to meet the specific organizational structures and perspectives of an enterprise.

Analysis and Discovery

Taxonomies are valuable because they bring structure, consistency and stability to the chaos of otherwise unstructured information. However, categorized information must be exposed and analyzed in a flexible and dynamic fashion that can be readily tailored to the needs of diverse user populations and the constantly changing needs of business, as well as intelligence and law enforcement domains.

What users desperately need is the ability to begin with a search that can be processed intelligently to identify related concepts and terms, and return results with the highest possible precision and recall. Once they are confident they have identified the most likely subset of relevant information (which can easily contain hundreds or thousands of results), users need additional tools to slice and dice large result sets into manageable chunks of information that can be rapidly absorbed, correlated, shared and leveraged.

This can be achieved through an innovation called Dynamic Classification. Dynamic Classification allows search results to be organized in real-time into classification views that are selected by the user in order to view information from various perspectives. For example, a search for a disease such as asthma will return many results. If the researcher's perspective is specific to geographic concentrations of asthma occurrences, results can be organized into a geographic classification thus narrowing the results. Alternately, if a medical researcher is interested in asthma as it relates to certain proteins or genes, the application of the MeSH and gene ontology taxonomies will organize the search results into classifications which expose co-occurrences of "asthma"-related topics with either "proteins" or "genes."

The dynamic classification approach can be further extended to multi-dimensional analysis, whereby the asthma-related search results could be simultaneously populated into a table with MeSH-related topics on one axis, and gene-related topics on the other. Now, the most relevant search results are found "at the intersections" of the simultaneously applied classifications, which can be selected by the researcher in real-time. This unique approach can rapidly reduce hundreds or thousands of results into highly relevant, bite-size chunks of meaningful information. This approach can identify relationships that would not otherwise be easily uncovered.

The ability to return intelligent search results, and then analyze them in many contexts in an iterative fashion, enables searchers to discover information, relationships and answers—knowledge—which might go undiscovered.

The Search for Nirvana

If information is the key not only to our survival but continued success and prosperity, we must continue to develop and implement promising new methodologies to harness information and resources.

Many approaches are available, both experimental and traditional. Each organization must make a pragmatic decision that balances risk with reward. Risk is the problem of missing critical data that results in poor decisions, missed market opportunities or intelligence and national security failures. Reward is the ability to achieve and maintain competitive advantage, improve efficiency or protect and improve the lives and prosperity of the citizenry.

Convera Successes

A major US financial institution deployed RetrievalWare as its Enterprise Discovery Platform, consolidating eight separate search tools onto a single-solution platform, supporting as many as 40 separate businesses and tens of thousands of users accessing terabytes of geographically dispersed data. RetrievalWare allows each business unit to maintain control of their own information and applications, while providing users with a unified, secure access point to the valuable information within.

A global pharmaceutical company deployed RetrievalWare to provide more than 50,000 employees with fast, secure access to its vast information resources. RetrievalWare provides users with the most relevant information from a variety of repositories including: R&D document repositories for pharmaceutical, vaccines and biotechnology; product literature; corporate information; and various Internet sites. With RetrievalWare, Intranet search engine use has increased an estimated 50%, while search productivity has improved an estimated 15%.

PSA Peugeot Citroën Group, the world's sixth largest automobile manufacturer, selected RetrievalWare after a stringent evaluation process in which nine vendors were compared across 150 different selection criteria. RetrievalWare now provides 60,000 employees in 140 countries with simple and secure access to a vast array of multilingual documents stored on Peugeot's corporate Intranet. RetrievalWare is also deployed on the Peugeot extranet site and is used by nearly 40,000 partners and customers.


Convera is a leading provider of mission-critical enterprise search & categorization solutions. Convera's RetrievalWare solutions provide highly scalable, fast, accurate and secure search across 200 forms of information, in 45 languages. More than 800 customers in 33 countries rely on Convera's search solutions to power a broad range of mission critical applications.

1 L. Latham. You Can Document ROI for Web Content Management. June 19, 2003.

2 MeSH is the National Library of Medicine's controlled vocabulary thesaurus for Medical Subject Headings. It consists of sets of terms naming descriptors in a hierarchical structure that permits searching at various levels of specificity. It can be applied as a taxonomy.

3 The Gene Ontology™ (GO) Consortium produces a controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing. GO provides structured networks of defined terms to describe gene product attributes. GO is one of the controlled vocabularies of the Open Biological Ontologies.

4 The Defense Technical Information Center (DTIC) is the central facility for the collection and dissemination of scientific and technical information for the Department of Defense (DoD). DTIC provides an open source taxonomy focused on defense and technology oriented information.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues