Automating Perception, Part 4 Analyzing the search solution market
This series of articles has focused on the critical functions that today's information retrieval vendors are addressing. Eventually one smart idea for cutting through the deluge will create a darling in the expanding IR software market. For now, next-gen tools demand coming to grips with confusing nomenclature, like categorization, classification and taxonomy. But all vendors strive for the same goal: the art of making search results cost- and time-efficient for everyday KM users. In this article, we address the final variable: the emphasis on easy implementation.
Google Search Appliance
In an interview about the Google Search Appliance , Nate Tyler and Debbie Jaffe discuss the product, which is delivered as a hot yellow Sun Solaris box. "Our value is that we are an appliance solution: fast to deploy, easy to manage, no customization required, and the Google search interface requires no training," says Tyler, a public relations manager for Google.
That is, of course, the biggest lure of Google—the fact that any Web user is already most likely a user of the search interface. Tyler explains, "Google has the high level position of quality in the search market. And the experience of over 200 million users on google.com every day is inherited in the Google Search Appliance."
As product manager, Jaffe defines the product, "For $28,000, we deliver hardware, software and support for the first two years. There are no surprises, there's only one vendor to call."
"We can crawl any content that has a URL, and we have a direct adapter to Lotus Notes," Tyler says. "In the environment of shrinking CIO budgets, there is a demand for simple solutions that don't require tons of complex setup. We really try to automate many admin functions."
Google's corporate mission, according to Tyler, is "to organize world information and make it universally accessible and useful, and to give multiple viewpoints on one story. We emphasize a concept of freshness, and that carries over to the Search Appliance ... We are trying to give 90% of the functionality at a fraction of the price (of competing products)."
Google gives hits on sites that other Web users have linked to, a democratic way of determining "popular" sites. That same functionality can be focused on company-specific information, letting users enjoy all the familiar Web powers of google.com, with the Google Network Search Appliance.
IMR Alchemy Search
Dan Lucarini, marketing VP for Information Management Research—makers of Alchemy Search, says, "We aim for a balance between what the user needs compared to what they have to learn to use. We have always focused on the transactional side of information retrieval."
IMR Alchemy benefits from lessons learned from demanding clients. "Why did you choose Alchemy?" Lucarini asks his clients. "Because we have too many young newbies!" they reply.
Asked to comment on the combined use of metadata and full-text searching, Lucarini says "We are more like Yahoo than Google, because we rely on human categorizing. SMEs (subject matter experts) decide on appropriate metadata," which is applied to files via document profile fields.
Expanding on the idea of ease of use, Lucarini describes the way an experienced knowledge consumer like Oak Ridge National Laboratory uses Alchemy. "We use drop-down tables to normalize the data going into the profile fields, letting individual humans intelligently classify documents, but make sure they all use the same set of index terms," he says.
"A ‘knowledge collector' was the logical development of this SME classification system," Lucarini continues. "Department of Energy (DOE) labs created an XML taxonomy, used by SMEs to categorize information, and continuously built over years of practical use. We have been successful because we haven't tried to do it all. We've always been open to access someone else's knowledge (via SDK and API tools)."
Lucarini explains another advantage of solutions that have been successful over time in facing customer demands for data security. "In Alchemy, all docs go into a container file, data that has been ‘munged' (compressed). Blowfish encryption is available, so that files are secure not only inside, but outside the enterprise. Alchemy allows secure transport of databases," he says.
"We have over 400 IMR service bureaus who incorporate our search products with their conversion projects," Lucarini adds. "We can separate the indexing steps, so that service bureau operators can enter data during conversion, data can be automatically captured from other applications and applied to documents, and end-user subject matter experts can enter the data that only they would know."
Dave Haucke, US marketing manager for ISYS/Odyssey Development, shares the motto of company founder Ian Davies: "We help people find things fast," Haucke says. "Our goal is to allow the structure of the KM solution to form itself."
A short white paper by Davies states: "How does the unmanageability of knowledge make it easier to manage? It's like Jujitsu, the ancient martial art of self-defense, which uses the attackers' own weight and strength against them. Over-engineering and over-design are your enemy. So the key to success is to have virtually 'designless' solution. The immediate benefit is that you can't get the design wrong. Second, you don't need many people to make it. And third, it doesn't take long to do."
Commenting further on the implementation issue, Haucke says, "Our number one advantage is our ease of deployment. In many cases the RFP takes six months, and deployment is done in days ... Our number two advantage is cost, price is less of an impediment because we are not based on number of documents. We have a lot of A-list customers at the workgroup and departmental level."
Derek Murphy, senior software engineer, says, "Computer-generated taxonomies will be 20% to 50% accurate, if you're lucky. We will aid you in constructing your taxonomy. If you define categories, we'll fit your document into those categories."
In describing the nature of ISYS as an over-arching tool that works with standard industry architectures, Murphy notes, "We provide index building tools for Lotus Notes, Outlook and SQL"--to bring effective search capability to those broadly deployed applications. Murphy continues, "We can extract metadata from PDF, Word and other applications, and then use our search syntax to address the metadata. Our data sources can be documents on a file server, e-mail and attachments, Exchange servers, SQL and Oracle databases, PDF and Flash content."
"ISYS provides robust search features, with a query syntax that includes Boolean and positional operators," Murphy continues. "Natural language is also available, so you don't have to train the users to create productive queries. You can define your own natural language to match your common usage. For example, "car" = "automobile" and so on. We use conflation so that we match query terms to various tenses and forms of words. We provide intelligent date and number searching, so that terms are normalized ... Every search will bring results in a second."
Other features of ISYS include a spidering tool for use on both internal and external sources of data, as well as intelligent agents that notify users when new data is found that matches specific search criteria.
Matt Turck, CEO of TripleHop, responded to a question about high-level value: "We are an enterprise search company, like Autonomy and Verity. We all offer a single point of access to corporate information, internal and external," Turck says. "Leadership in this market is still open. At some point, this market is going to explode, because there is a high level of frustration with the industry pioneers. Raised expectations have not been met ... Large-scale implementations of the major vendors require expensive PSG." To overcome that limitation, Turck offers a goal to bridge the gap between automation and manual tasks.
Joaquin Delgado, CTO for Triplehop, explains, "We mine the use of terms in context." Referring to automating the indexing process, he explains, "There are three points. One is universal access, a single entry point, what is often called federated search. Our second point is independent search, where the relationship of terms is specific to the corpus. Our third is personalization, where we add a layer of user feedback. What did a user do with these search results? We can help future users find information."
"Auto-categorization on the fly is the difference," Delgado continues. "We can broker search results, such as Google, so you don't need to index external data. It's more meaningful if the user can bring categorization to the results, to the hit list. There are many ways to displaying the results. You can visualize by author or any other parameter. We can provide a taxonomy of terms to help users expand their queries. The user can see and modify expanded terms. In this way, we can personalize concept searching."
Delgado describes a feature that is not common, "When users are reviewing results, we have a 'Rate It' button that lets the system ‘learn' user preferences. Other approaches use an implicit rating, by watching users and their results. Our Rate It button allows users to provide explicit feedback, which delivers organic relevance ranking, rather than just statistical relevance of search results."
Tony McKinley is with Input Solutions, e-mail email@example.com, and the author of "From Paper to Web" (imagebiz.com).