-->

KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

Search for the Integrated Enterprise

Enterprise Content Management (ECM) represents a critical new stage in the advance of the information age. In a marketplace largely driven by the innovations of emerging products and solutions, ECM is much more than the next must-have answer to today's business problems. Rather, it is both a strategy and the underlying technologies that help businesses transform their content into competitive advantage.

Giving Structure to Unstructured Content

Powerful search and retrieval is a fundamental and vital element of any successful ECM strategy. The true value of search and retrieval tools lies not in their ability to simply full-text index vast quantities of unstructured content, but rather in their ability to provide structure to that content through capabilities such as auto-classification and recommendations.

The quantity of unstructured data is rising exponentially in most organizations. Individual productivity tools, such as word processing systems, graphic design tools and e-mail, have created an explosion of unstructured content that must be managed, secured and maintained. As more and more of the value of an organization lies in its unstructured content, it becomes even more mission-critical to guide users to the content that they need to do their jobs quickly, accurately and securely.

The strength of an ECM system is that it enables organizations to develop targeted solutions to solve a wide range of business problems—from reducing the costs of processing financial transactions, to managing the exponential growth of corporate e-mail, from ensuring that the company is managed according to industry standards and regulations, to bridging the gap between users and sources of information.

To support this vast array of requirements, ECM systems must be assembled on a common framework of services for working with users, information and the processes that connect them. Today's distributed enterprise demands an integrated approach that provides consolidated control and secure sharing of content.

The emergence and domination of the Internet as a tool for sharing information has shaped the way in which people work with content. Perhaps the most fundamental means by which organizations can ensure that the appropriate users can interact with the necessary content is through an integrated search service. Unlike Web search engines, however, which typically are only required to index HTML pages and the occasional Microsoft Office and PDF document, ECM search services need to be able to index the broad range of documents and file types used across the enterprise—from invoices, records, and contracts to e-mails, spreadsheets, design specifications and engineering documents—and they need to do it securely and reliably.

Search as a Part of an SOA

For the ECM system to be a truly integrated content management solution, it must be able to provide consolidated access to all of its information. As part of a service-oriented architecture (SOA), the ECM search service creates a full-text index of all of the documents and other objects stored in all of the organization's content repositories. A single query can be broadcast across the corporate intranet, the corporate Web site, a partner extranet, the ERP database, content repositories from multiple vendors and even across e-mail content archived in storage hardware. The user only needs to submit a single request, and doesn't need to log into each and every one of these information stores to query its content.

It is not always easy, however, to search the unstructured data that comprises much of the content within organizations. While numbers obey very strict rules that can be easily interpreted by computer programs, words and pictures have few formal rules and are open to different interpretations based on their context. To complicate matters, content comes in hundreds of different file formats—from Microsoft Office files, to JPEGs and TIFFs, to native AutoCAD drawing formats—and the ECM search service must be able to access and index the information within all of these different types of files. A robust battery of content filters ensures that as new documents are added or existing documents are updated, the index is updated on the fly, so that all of the business content within the organization is quickly searchable.

While users demand rapid and consolidated access to information, organizations also need to ensure the security of content. Because the ECM search service is built upon the same robust foundation as the user and security services, organizations can be confident that their enterprise content is confidential and protected.

META Group, an IT research and advisory firm, considers the integration of search with individual content repository security policies to be a critical requirement for enterprise search software. "Many Web-only search vendors are inferring that they can index non-Web content, without building ECM connectors, by simply using native HTTP access methods," says META Group analyst Tim Hickernell. "Companies need to beware of ECM integration approaches that bypass ECM security," warned Hickernell.

Enterprise Information Retrieval

Consider the task of indexing a book called "ECM." Should the book be cataloged in the section on electronic counter measures, the European Common Market or enterprise content management? Simply opening the book would reveal the subject-matter to the reader, but determining the context isn't that easy for a computer. The challenge is to design a system that can make the same interpretation as people. An effective search tool must be able to learn concepts that allow it to automatically catalog documents, making the concepts easier for people to find.

To enable users to effectively find information when the number of documents in the index rises from hundreds and thousands to millions, organizations need to be able to focus on collections of documents, rather than just the documents. Traditional searching methods (such as full-text search, natural language querying, ranking and summarization) are simply ineffective when there is such a massive number of documents.

ECM systems can intelligently "tag" every piece of information with relevant contextual information. This classification of information enables the dynamic creation of a navigation taxonomy with which users can browse content. Through automatic classification, the ECM search service can extract and construct a dynamic structure for unstructured content. For example, during a product release you might produce a number of documents, including user manuals, quick start guides, quick reference cards and marketing and promotional materials. If we classify each document with information about its document type, and the product with which it is associated, we can enable users to easily browse content according to either structure. That is, one user might need to browse "all of the documents associated with Product A," while another user might need to browse "all of the quick reference cards associated with any product." With classifications, both users can browse the content as they require, regardless of how the documents are actually organized in the content repository.

Classifications can be automatically inherited by documents as soon as they are added into the content repository. By establishing a hierarchy of automatic classification, organizations can ensure that these alternate taxonomies of content navigation are always up-to-date and providing the appropriate information.

Concept Extraction

Exploiting contexts such as document type are useful for improving search precision, but do not provide an effective means of locating a specific document within a series of documents that are all very similar within those contexts. Concept mining and extraction provide a means of identifying common elements among documents and enabling searches to be refined according to those elements.

By analyzing the documents in a result set, the ECM search service can automatically extract the concepts that are deemed to be most important, such as organizations, people or other concepts. A concept extraction gives users a general overview of a series of documents without requiring them to view the documents individually. Furthermore, a concept extraction alerts users to different (and contradictory) facets of the document set, thereby suggesting ways to refine the query. For example, a concept extraction for a search of the word "diet" might return both "nutrition" and "weight loss." If only one of these concepts is relevant, the user can take steps to eliminate the other from the search.

These features provide unprecedented capabilities to organize and analyze the unstructured information within organizations, opening all of an organization's content up to everyone in the company. This is an enormous boost in the overall intelligence of the company.

Capitalize, Compete and Succeed

The strength of an ECM system is that its benefits are not restricted to a single department or line-of-business. With an enterprise rollout, reduced costs, improved productivity, and streamlined processes can be enjoyed across the entire organization. Organizations can invest in a single solution initially, and build that solution out to eclipse all of their business processes in the future. Information retrieval capabilities provide a viable foundation for an eventual enterprise-wide ECM system. After all, what organization isn't driven by its content, and its ability to extract the maximum value from it?

At this moment, a key breakthrough or business improvement that could sharpen your competitive edge lies somewhere in your organization—in an engineer's virtual briefcase, embedded in an e-mail message, or buried on a shared network drive. But if you can't find that information, you can't capitalize on opportunities—and you can't compete, let alone succeed.

Organizations generate a wealth of information over the everyday course of business. ECM provides the framework to provide structure to this unstructured content; but as organizations grow, and the number of documents jumps from thousands to millions, no amount of careful organization and ordering is going to ensure that the right person is going to find the right document every time. A shared search service that is part of a service-oriented ECM architecture provides the information retrieval foundation to ensure that the right content is driving the mission-critical decisions made throughout your organization.


Open Text Corporation is the market leader in providing Enterprise Content Management (ECM) solutions for global organizations. Our software brings people together to improve innovation, achieve compliance and accelerate growth.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues