Finding the right piece of information, Structure and meaning improve search results

Just about everyone who has searched for information in electronic form has run up against the limitations of keyword searching. You either get back thousands of “hits” or miss the target entirely because it used a synonym you didn’t think of. The target might be an item you want to purchase on the Web, or a document that contains vital business information. Either way you are out of luck. New approaches to search and retrieval are beginning to make searching a more rewarding experience. Two frequent themes are the use of automation and metadata to better process and leverage information. Leading-edge products are helping to solve business problems by locating information on Web sites and by retrieving business information from multiple databases.

The Certified General Accountants of Ontario (CGA Ontario), a professional association for accountants, implemented its Web site as a marketing tool but quickly realized its value in providing membership services, particularly to students. With 800 pages on the site, however, CGA Ontario found that its constituents needed guidance in locating information. The association turned to (Hummingbird) to provide a search capability.

Index of files

“We were seeking a company with a well-established reputation, a customizable product and strong support,” says Boyd Dyer, CGA Ontario’s manager of Web technology. “We also expected to increase the size of our site up to 15,000 pages, and needed a product that could scale up.”

CGA Ontario has implemented the search capability in Hummingbird’s Enterprise Portal Suite, and plans to deploy a portal in the future. A good measure of CGA Ontario’s success is a reduction in the number of inquiries being handled by its call center about where to find information on the Web site, indicating that users are more easily locating what they need.

The search tool resides on Hummingbird’s Fulcrum KnowledgeServer and actually searches a database built from an index of files rather than the core data, which reduces response time. It creates a taxonomy of topics that the developer can refine, and provides automated summaries of documents built from existing profiles. Terms can be weighted to increase their ranking in relevance listings, and searching for “more like this” result sets can also be applied. Search results can be viewed either in native format summarized in text form, or using Fulcrum FullView, which displays 200 types of files.

Jason Weir, product marketing manager for Fulcrum KnowledgeServer and Hummingbird EIP, predicts that the fully customizable, Web-based environment of enterprise portals will be the de facto work space of the future, and that automating the indexing of such information sources as e-mail and even discussion boards will enhance the acquisition and retrieval of tacit knowledge.

Coldwater Creek a retailer of women’s apparel, wanted to help customers locate purchase items on its Web site. Begun as a catalog company, Coldwater Creek had launched its Web site in 1998, but at that time did not include a search function. Instead, customers clicked through a well-organized series of product categories. With about 3,000 items available, however, Coldwater Creek recognized that having search capability could benefit its customers and increase sales. Plans were made to add a text-based search function, but prior to implementation, Coldwater Creek felt uneasy with its performance and canceled the deployment. Having built a reputation for outstanding customer service, the company did not want to frustrate customers with an application that did not produce meaningful results.

Categories and attributes

The project remained on hold until Coldwater Creek discovered EasyAsk, a company whose product of the same name takes advantage of data structure and attributes to locate information. EasyAsk can search on categories such as “dresses” and on attributes such as color and size. It also can default to a text search in the absence of results from either the categories or attributes. In that case, techniques such as stemming (using the first part of the word), a thesaurus and alternative forms of words are included to optimize results. The natural language interface allows prospective customers to type in: “Find women’s dresses in a size 10, blue or green, for less than $75” and get the sub-set of items that fits that description. The customer could also type “under $75” and get the same results as “less than $75.”

Significant work goes on behind the scenes to make life easy for the customer. For example, clothing suppliers may use a variety of marketing terms such as “sea mist” for the color green, and those must all be translated before the search is done. Phrases such as “perma-press” and “no-iron” must be normalized to provide a linguistically level playing field.

But all the work is worth it, claims Coldwater Creek’s director of communications, David Gunther, who says, “We have had many customers tell us that this is the first time they have ever purchased anything online. It’s far better for us to go through the iterations that make our system easy to use than for the customer to struggle.”

The numbers bear out Gunther’s opinion--in just two years, online sales have increased from less than 1% to about one-third of the company’s business. Gunther also believes that the multi-channel approach is the best way to go, which for Coldwater Creek involves its catalogs, Web site and an expanding chain of stores.

Another tool that can do the heavy lifting for users is Applied Semantics’

“In understanding meaning,” says Gilad Eliaz, CIO and co-founder of Applied Semantics, “you can’t just look at frequency of words. Synonyms may be used in the text, for example, which would affect an analysis based on words alone.”

The components of CIRCA include an autocategorizer that classifies the page into a predefined taxonomy, a page summarizer that converts the text into a concise statement and a metadata creator that creates tags based on concepts in the page. One of CIRCA’s advantages is that it does not need to be trained on sample documents or taught rules. Its intelligence is embodied in the semantic analyses that have already been done. While a taxonomy deals with classification, CIRCA is based on an ontology, which provides a detailed representation of the concept underlying each word and how it is used in language.

Rather than competing with established search engines, CIRCA is designed to augment them for improved search results. Since it is XML-based, it can also be integrated with content management, document management and other enterprise systems. The product has been available on an ASP basis and is scheduled to ship as an enterprise product in August.

In a keynote speech at the E-Gov 2001 conference, Oracle CEO Larry Ellison suggested that only one IT problem needs to be solved: information fragmentation. He claims that fragmentation makes it impossible to retrieve essential information. Examples abound, but Ellison focused his comments on healthcare. Of $1.5 trillion spent annually on healthcare, $500 billion is consumed by record keeping, he said. Despite that investment, however, neither medical professionals nor consumers themselves have even a remote chance of seeing an integrated view of an individual’s medical history.

Ellison’s answer is a centralized medical database. That (Oracle) database would contain, for example, all prescriptions an individual has received, along with other medical records. Centralized databases could facilitate information retrieval, but in many cases may not be feasible.

An alternative to centralized data storage is to integrate multiple databases--either physically by creating a data warehouse, or logically by viewing the data as if it were in one repository. Noetix (noetix.com) provides a solution that simplifies retrieving information from disparate databases. As part of Noetix’s Enterprise Technology Suite (NETS), NoetixViews automatically maps the location of complex data structures and generates custom business views that reflect the unique configuration of Oracle databases. It comes with many typical business queries already set up, and can also be customized or used in conjunction with other business intelligence tools. Noetix QueryServer translates multiple data sources into meaningful business language, facilitating reporting against the databases.

“NETS allows an enterprise to take sales data, and then combine it with information from another database to provide usable business information,” says Ann Markley, VP of product marketing. By the end of this year, Noetix hopes to announce a new software tool to be used with Siebel’s (siebel.com) CRM system that automatically creates metadata and maps a database. The software should allow users to easily and cost effectively extract data from Oracle and Siebel databases, and combine that information into a single report.

Judith Lamont is a research analyst with Zentek Corp., e-mail jlamont@sprintmail.com.

KMWorld Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues