-->

KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

Distributed Search Technology

The need for fast, accurate search and retrieval continues to be a significant factor in driving software sales for Fortune 1,000 companies and their employees, as well as for the general public. A recent survey found that search is now the second most popular online task behind email. This is due in large part to the commercial and technical success of Google. More and more senior managers are saying "Why can't I find information in my organization like I can find information on the Web using Google?" Many customers and prospects over the last year have expressed a strong desire in their organizations (often coming from senior management) to consolidate and simplify the searching experience.

Federated search and access to all of an organization's repositories is also very important. Users would like to have a single search facility that searches all of their available information and repositories, including their own desktop; aggregates and de-duplicates the results; and builds clusters of results. This can be handled by "search federation" technology that communicates with the search engine of each individual repository or by "content extraction" technology that exposes the information in the repository to a single master search and indexing system. Either or both of these methods are being required, depending on the functionality needed by users.

In the specialty search marketplace, there is an increasing interest in domain-specific search technology for electronic discovery and litigation support. In the intellectual property and commercial information markets, interest is on the rise for scalable, stateful, 24x7 search technologies that incorporate all of the new capabilities described above as well as visualization capabilities to help users quickly navigate to the most useful information. In the email management and archiving arena, users are interested in being able to search metadata seamlessly along with the text of messages and even the text inside of attachments.

The components of a distributed architecture system are:

  • Search director. Manages the entire distributed system across all servers. It forwards the requests to the appropriate server and combines search results from the distributed servers.
  • Shared search engines. Search engines that are user-independent and can be shared and configured in engine pools for optimal performance and utilization of server resources. This eliminates the requirement of a dedicated engine per user.
  • Search broker. Manages pools of shared engines to most efficiently handle simple and very complex queries, splitting queries among multiple engines as needed for optimal performance. The broker balances incoming requests between engines, selecting idle engines whenever possible to handle new requests.

BRS engine. ( Discovery Server is the Open Text re-branded name). It determines this is a difficult search, and replies to the broker that the search must be split into multiple pieces. The broker divides the search into piece-qualified searches using the guidelines returned by the BRS Engine. The broker finds two available BRS engines and forwards the divided search to the selected engines. Each engine evaluates the search across the specified piece set and updates the temporary storage for the user's request, creating backreference files and updating the user static files. Upon completing the search request, the BRS engine must commit the temporary work files (backreference and user static files) to shared temp storage so that they are visible to both the host and mirror machines to handle subsequent end-user requests.

  • Load balancing. Manages distribution of workload across servers of varying capacity for optimal utilization and performance. It permits different balancing schemes to be enabled by user-configuration criteria; workload can be directed to specific servers to permit administration on others.
  • Static splitting. Manages distribution of workload by splitting searches across servers when needed.
  • Configuration server. Serves as a repository and management tool for centrally managing all configuration and initialization files.
  • Applications manager. Monitors critical processes, starts and stops processes as needed, identifies failed services and auto-restarts services to ensure a more fail-safe operation.

The Challenges
The issues and challenges around search today are significant. We are seeing increasing interest in the additional search capabilities that were described above. As the business applications being built begin to rely more and more on robust, full featured, scalable search technologies used in ways that provide competitive advantage, the suppliers of search technologies are going to see increased pressure to keep up and even surpass the state-of-the-art in the marketplace. Distributed search technology is only one approach to these challenging problems.

Discovery Server
Discovery Server is the Open Text re-branded name for the BRS/Search product line. BRS/Search is being used as a general purpose search engine in over 200 installations around the world. In 2004, Version 9.0 was released as a completely redeveloped information discovery tool with stateful capabilities, taxonomy navigation, auto-classification, concept and fact extraction and a Java-based standard UI. It is being actively developed and evolving into a true distributed search product able to handle a large amount of legal-related data. It offers several APIs and can be easily integrated into OEM products. Open Text is working with large organizations to make Discovery Server a fully scalable, fault tolerant, 24x7 system with instantaneous failover.

The Open Text COTS product that addresses some of these needs is Distributed Search. It has been developed over the last four years to address large amounts of patent-related data, where complex inquiries are being applied by more than 4,000 simultaneous users hitting large volumes of data. The architecture developed for the Distributed Commands version of BRS/Search takes advantage of shared storage areas of multiple machines.

The Distributed Search product is a suite of servers working together to provide consistent access to BRS Engines. It uses an administration tier containing a search director component. The director supports splitting searches across multiple machines with duplicate database sets and distributing commands to the machine that is least utilized. End-user requests can be piece-qualified to a single database set or to pieces of all database sets.


Open Text (www.opentext.com);is the market leader in providing Enterprise Content Management (ECM) solutions that bring together people, processes and information.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues