Vote Now for the KMWorld Readers' Choice Awards !

Search: an interesting muddle

This article appears in the issue February 2007 [Volume 16, Issue 2]

In December, IBM and Yahoo released their joint search engine, OmniFind Yahoo Edition, adding confusion to the chaos already reigning in the search engine software market. This search engine is a free download from http://omnifind.ibm.yahoo.com. Yet, despite the price, or lack of it, this is no weak sister.

Although the software has a small footprint and can run on a laptop, IBM has added incremental indexing, support for 200+ document types, support for 30 languages and linguistic features such as synonym detection, spelling correction, lemmatization, stemming and a "did you mean" feature that suggests alternative queries. The relevance ranking is adjustable. It does not rely on link analysis, which often fails inside the enterprise. Instead it uses OmniFind relevance ranking algorithms.

Based on the Lucene open source search engine, the OmniFind Yahoo Edition goes beyond commodity search. It is certainly quick to install: Download it, configure it in three clicks and point it at a URL to crawl. However, it is also configurable and customizable. Administrators can change the look and feel of the search page, create shortcuts to other Web pages or best answers to a top query. Reporting tools monitor usage to determine null or frequent searches, and to gauge the effectiveness of the results being returned.

Not surprisingly, this free search engine is meant to be an entry point to IBM's suite of OmniFind search software. Although it is free, customers can purchase phone support for $1,999 per server per year. Businesses that outgrow it can move up to the next tier, which includes security, authentication and a larger document capacity. Customers may eventually want to migrate to an expanded OmniFind information access platform that includes additional capabilities such as security, analytics, semantic search (OmniFind Enterprise Edition); customer service features such as navigation, natural language queries, more reporting tools (Discovery Edition); better data plus content integration with metadata management and information as a service (Information Server); or a variety of problem- targeting custom applications that include more advanced features (Master Information Solutions).

Now that IBM has entered the list, the search market will grow even more contentious and confused. Do you need an extended search platform or a basic search engine? What about security? And should it be at the document level or the subdocument level? How about multiple languages? Can the central search engine be tuned for multiple and very different collections of content and data? Will it be used by employees or customers or both? How do you balance relevance ranking of customer addresses vs. lengthy analyses or transaction records? Can you accommodate manufacturing, sales, marketing, research and competitive intelligence workers with the same system? How about integrating multiple formats and different software applications that feed into a single point of access? And how many documents do you have, anyway?

IT departments are rarely well versed in the content technologies, and often persist in expecting them to behave like databases. Yet, with the emphasis on words and their meaning, these search, categorization and text analytics applications cannot be put in the database bucket. Their basic architecture, purpose and expectations differ: They are built to accommodate the richness and fuzziness of language. Just as language is subtle, so are the differences among search engines. Can you integrate desktop, intranet and Web search and do it securely?

OmniFind Yahoo Edition is positioned squarely against Google's search appliance (GSA). But we expect that it may rock the lower end of the search software market as well.

Vendors like ZyLAB, Coveo, Ultraseek (from Autonomy), dtSearch, ISYS or X1 have been moving up-market slowly as they add more features and scale to greater numbers of documents. This is classic disruption: high-end platforms from FAST and Autonomy, IBM, Endeca and Google, with vendors like Vivisimo and ZyLAB joining their ranks. Free software from X1 and now IBM. Open source software. Desktop search applications, and search embedded in more and more applications like CRM, e-discovery or office suites. And what about task-specific applications from Inmagic, LexisNexis, Factiva, EasyAsk, Mercado or Endeca? It makes for an interesting muddle.


Search KMWorld

Connect