Search plus: far-reaching and versatile in the enterprise
By Judith Lamont, KMWorld senior writer
As the volume of information stored in enterprise repositories continues to grow, the need to search that information to aid in decision-making has also grown. Searching has become more sophisticated, with natural language and personalization enhancing document retrieval. But some other significant forces are also at work.
Information is being stored in many different types of repositories, and users want to retrieve it in a single operation. And thanks to XML, the line is blurring between unstructured and structured data, so the difference between a “search” and a “query” is narrowing, making it easier to see all the information related to a topic. Some vendors prefer the term “discovery” as a more encompassing concept. The search function has also become more deeply embedded in enterprise applications, linking to analytic and process functions. New software infrastructures are helping search solutions gain a broader reach into those enterprise systems.
Across multiple repositories
In the Sheriff’s Department in Ventura County, Calif., crime analysts use a range of databases, reports and other information sources in their investigations. Often, the ability to quickly link one fact to another is critical in solving a crime. The Sheriff’s office uses ISYS from Odyssey Development to search multiple repositories and find such connections. In one case, a burglary suspect was linked to a series of 39 other burglaries in just a few seconds when a detective found a blood match from a previous incident. Crime analyst Steve Sullivan values the ability of ISYS to search through years of historical data, some of which is in formats no longer used in the department. The ISYS Intelligent Agent also monitors the knowledgebase for topics of interest to particular investigators, such as a suspect’s name, and notifies the analyst when new information is entered into the system.
Lionel Sawyer & Collins, a Nevada law firm, used ISYS to search gigabytes of data about a bankruptcy case. “We took a mountain of discovery data submitted on CDs at the last moment,” says Larry Jessup, IT director at Lionel, Sawyer & Collins, “and were able to organize it quickly into categories based on the relevant issues.” The company also uses ISYS to index documents in its brief bank, which explain rationales for supporting awards, change of venue and summary judgments. New associates at the firm can easily find briefs that can be used as references, as well as commercial forms or templates for contracts.
“Our philosophy is that a search solution should start working for you right away,” says Ian Davies, CEO of Odyssey Development, “not the other way around.” Davies cites ease of implementation and use, along with low cost, as advantages of ISYS.
Seeking XML data
With the increasing acceptance of XML as a data format, the distinction between structured and unstructured data is blurring, because XML tags provide structure to text-based documents. That in turn provides new opportunities for analysis.
“XML has been a watershed technology,” says Tim Hickernell, VP of the META Group’s Technical Research Department, “because the metadata indexing allows structured and unstructured data to be viewed side by side.” That ability makes it much easier to look for patterns in unstructured data.
“Retrieving data is useful, but making sense of it is far more interesting,” says Sundar Kadayam, CTO of Intelliseek. “We can begin to solve some real-world problems at the line of business level.”
Intelliseek’s BrandPulse 360 is designed to search and aggregate data from many sources, filter the data for relevancy, and mine it for key topics and competitive information. BrandPulse 360 can also examine information on consumer message boards and in CRM applications, and make a determination of “consumer sentiment,” which may be positive, negative or neutral. When that qualitative data is converted to metadata associated with each document, further analysis can be done.
At a major food company, a new product release was generating consumer complaints that were logged into the customer relationship management (CRM) system. Because the company was monitoring consumers’ messages on public Internet discussion boards, it was able to discover that the “complaints” were from customers who liked the new product but could not find it in their local stores. As a result, the company was able to increase production and target shipping supplies to those geographic areas in which the complaints were registered.
“With BrandPulse, information is discovered in real time from multiple sources, both internal and external to an organization,” says Kadayam. Depending on access credentials, users can tap into proprietary sources such as Lexis/Nexis. BrandPulse then applies a set of text-mining technologies to classify, rank and extract specific elements from the information. The BrandPulse dashboard is an analytical tool that combines preconfigured views with ad hoc query and reporting functionality, and personalized searches and alerts.
“Historically, about 80% of enterprise data has been unstructured,” Kadayam points out. “Very little of this has been usable in business intelligence applications previously.”
Unified search for e-commerce
E-commerce is another area in which search performs a critical function. Mercado developed a search solution tool that was well received by online retailers and shoppers. It creates a taxonomy on the fly that helps the purchaser zero in on the desired items and features. Underlying linguistics resolves issues such as color synonyms during the search process. However, Mercado’s clients also sought a way to integrate product catalog information with call center and customer self-service applications. For example, a service representative in the call center might want to make a product recommendation, but would need detailed data contained in a brochure. The seller might want to invoke a business rule that pushed a particular product to the top of the promotional list.
“We decided to develop a unified search product,” says Yaron Dycian, marketing director at Mercado. The Enterprise Search and Navigation (ESN) is an XML-based solution designed for a Web services environment. It searches both databases and document file systems.
“The ranking mechanism is very important,” Dycian notes. “It normalizes results from structured and unstructured sources to provide a single relevancy ranking for search results.” ESN also dynamically narrows the search results into progressively smaller categories, such as “How to Use My Phone” or Frequently Asked Questions, which helps navigate the user to the target information.
Searching for fraud
For applications in which detecting relationships among data elements is important, the ability to search disparate databases is extremely valuable. Infoglide Software initially developed software to address fraud detection, and launched its Bladeworks solution in 2002 as a broader approach to risk assessment. Bladeworks uses an algorithm that allows detection of associations between different entities, such as a name or address, in remote databases. Bladeworks has its own search capability (as well as analytical and scoring functions) and an infrastructure that allows for incorporation of other search engines to access relational databases. Other software such as data mining products and neural networks can be plugged into the Bladeworks infrastructure to create a custom application.
Responses to inquiries can be in a variety of forms, including the closest matching record, a score that represents a degree of similarity, or a count of how many records match the target with similarity above a certain threshold.
“Because Bladeworks does not need to return the actual records,” says Michael Shultz, CEO of Infoglide Software, “demands on resources are modest. Also, privacy issues can be alleviated because the records do not have to leave the original source repository.” Bladeworks keeps track of all the data and records used to arrive at a decision, so the process is traceable. In addition, the algorithm used for decision-making examines the business rules at a low level in different repositories to come up with a unified and meaningful score for risk.
A large data provider uses Bladeworks across its multiple, disparate databases to analyze credit card applications for propensity for fraud. The company searches databases containing such information as criminal history, and uses the data to determine the likelihood that the applicant will engage in fraudulent actions. Concerns about corporate governance also present good opportunities for software that can search and correlate data from many sources.
“For example, you could find out if insurance claims are being paid to people on the company’s payroll,” says Charles Moon, CTO of Infoglide Software, “or look for non-obvious relationships with vendors.” At the back end, Bladeworks can plug into business intelligence solutions from MicroStrategy (microstrategy.com) or Cognos (cognos.com). But because Bladeworks functions on the fly, the data does not need to be moved into a single warehouse in order to conduct a search.
Into the infrastructure
Search vendors are providing numerous smart capabilities, according to Gartner analyst Whit Andrews. “Products are configured to call the right application logic at the right moment,” Andrews says, “or run third-party software that establishes another layer of value.” That functionality could not occur without a greater integration of search technology with enterprise infrastructure, a trend that is seen more and more frequently. New software solutions have emerged to facilitate this integration.
Agari Mediaware has developed a software platform to support the development and management of digital content.that automates the “content value chain.” Although it is not a search tool, Agari’s solution provides another model of how the search function can be applied to multiple data sources and integrated with other applications. The philosophy behind Agari’s products is that content applications such as search technology, digital asset management, document management and taxonomy generators should be integrated as services. Agari’s Media Bus software provides interoperability without requiring changes to the applications. Rather than integrating point-to-point or centrally, the Media Bus is a distributed system. It uses the Java Message Service API, which is a part of J2EE. Media companies that must locate and manage a variety of digital assets from many repositories are among those that can benefit most from Agari’s solution.
Digital Harbor focuses on discovering and bringing together disparate information and presenting it in a single, consistent view for the user. “Rather than providing a static Web page that contains predefined content,” says Austin Wells, VP of product management at Digital Harbor, “we create a dynamic, XML-based interface that helps users correlate information.” The Professional Interactive Information Environment (PiiE) platform creates “composite applications” from existing enterprise applications.
One example of a composite application is a dynamic, real-time interface that presents government analysts with terrorist events, geographical maps, satellite imagery and related media, all from different sources that are “fused” together and correlated so that one source drives another. The result is that an analyst doesn’t navigate from page to page to see how information is related, but sees a dynamic presentation of the information in a single interface.
The ability to correlate information is particularly valuable in the defense and intelligence communities, where “connecting the dots” has a high priority. “The contextual picture allows us to get answers that we would not see from stovepipe tools,” reports an analyst in the intelligence community. “Seeing results from multiple searches at one time gives us a better opportunity to construct the big picture.”
Wells emphasizes the flexibility of PiiE’s integration. “There is a lot of information in back-end repositories, but users can’t always wait until an application is built to accommodate a new database. With PiiE, the user can click on an item such as a customer name, and immediately be tied into other related data,” he says. Because PiiE provides a conceptual model of back-office business processes, rules, events and data, users can discover relationships between information from different sources without writing code or asking the IT department to add functionality.
Compared to other IT software sectors, search and retrieval software is showing robust growth. IDC reports that the market grew 7% in 2002 for content access tools, which include search, classification, text mining and other related software products. Assuming that customers continue the trend toward implementing more sophisticated, linguistics-based search tools, IDC predicts even higher growth over the next few years, with an annual increase of 17% forecast for 2003 and over 20% for 2004.
“We believe that a content infrastructure will emerge,” says Sue Feldman, VP of content management and retrieval software for IDC, “that unifies access to database and unstructured content, and combines content technologies such as search, text mining, format conversion, security and categorization with traditional database ETL tools. Web services, including XML, WSDL, UDDI and SOAP, will enable this convergence.” Users will be able to select the tools they need for each application in order to create integrated access to information throughout the enterprise.
More of what’s hot in search
FAST Data Search 3.2 from Fast Search & Transfer (FAST) has expanded from a single product into a suite with fuller functionality. FAST supports 200 file formats and database formats from Oracle and DB2. It provides the following key functions: XML conversion and data aggregation across disparate repositories; content preprocessing, including automatic concept extraction; real-time search and filter; and front-end query and result set processing. FAST’s Live Analytics provides on-the-fly analysis, and the Business Managers’ Control Panel lets business managers control the search function to optimize the results. Highly scalable, FAST Data Search analyzes content, interprets queries and presents results in meaningful clusters. It is being used by Reuters, TIBCO, IBM, CareerBuilders, and FirstGov to search enterprise and Web content as well as to deliver real-time alerts to individual users.
Verity Federator 1.0, recently launched by Verity, searches files in Verity K2E and Ultraseek (acquired by Verity through its purchase of Inktomi’s enterprise search business) collections with one query. Additionally, a newly introduced Federator Developer’s Kit can be configured to search any data source. It was used by a leading chemical company to create a “Chemist’s Workbench” application through which researchers can now access 20 proprietary databases, also through a single query. Verity can either create a single index from disparate sources, using its “gateway technology,” or issue a query in real time directly against indexes in other applications. For example, a federated search could get results from a Siebel (siebel.com) search dialog box and rank the results along with other hits. Working directly against a data source is an advantage when a federated index either has not been updated or it not cost-effective to develop.
Logik, the flagship product of Coredge Software, previews unstructured electronic information including documents, e-mail and Web content, to provide summaries and an index of key themes. Logik is designed to save time for users who must read and filter large amounts of information to seek out key points. One user reviewing large quantities of stock market information says that Logik lets him recapture at least an hour a day. EMSoftware Solutions, a professional services firm, is using the product to assist in proposal writing. Logik speeds the proposal development process by helping the company find and repurpose old proposals based on the key themes they contain. It is also being used to search for information on past performance, qualifications and candidate résumés to match with each proposal.
Judith Lamont is a research analyst with Zentek Corp., e-mail firstname.lastname@example.org.