The evolution of desktop search— Good news for the knowledge worker

This article appears in the issue February 2005 [Volume 14, Issue 2]


   Bookmark and Share

After years of undeserved obscurity, the category of desktop search has finally tipped into its own little Cambrian explosion of product announcements and major upgrades from big and small players, signaling an important evolutionary event in the history of KM tools.

The overwhelming majority of investment in KM technology development and implementation has been about economies of scale. The prospect of being smarter together is very appealing, but enterprise repositories have consistently disappointed individual users and frustrated enterprise CKOs when they failed to yield major productivity improvements.

On the other hand, if you've been reading my "Personal Toolkit" column in KMWorld, you know that I advocate taking an egocentric approach to knowledge ecologies, in terms of both information skills and tools, and social practices and structures. Each of us can only work our networks from the starting point of our own node. Because of that, context is always greatest when we start with what and whom we already know—and need to know—and spiral out from there.

Nowhere is that egocentric view more important than in understanding how to support a knowledge worker's needs and behaviors for both learning from others and searching for explicit knowledge. When statistics claim that the average knowledge worker spends one-third to half of his or her time looking for information, they mean you. And to a great extent, you are looking for your own stuff: your notes, messages and documents, your research clippings, documents and other attachments sent specifically to you, etc.

Making decisions frequently requires new information or ideas, but consider two assumptions:

  • First, that personal documents have a higher, more immediate relevance for decision-making than those retrieved from the Internet or corporate intranet.

  • Second, and perhaps more important, personal documents (and existing tacit knowledge) help to contextualize impersonal material collected externally.

Search comes home

So I was glad to see that Google's announcement last October of a new desktop search tool received a lot of attention and began a frenzy of similar announcements. By the end of 2004:

  • Google unveiled its long-rumored desktop Google Desktop Search (GDS), which indexes a user's hard drive and presents links to local documents in the same browser window as hits from its famous Web crawler.

  • Not to be outdone, Microsoft released in December a beta version of its own desktop search application, as part of the MSN Toolbar Suite, integrated with its new Web search technologies. The company had purchased e-mail search vendor Lookout in July.

  • Yahoo got into the game as well, starting with the acquisition of Stata Labs. Then in December, Yahoo announced a partnership to distribute a customized version of X1.

  • Copernic Technologies introduced Copernic Desktop Search to complement its Web search tools. It also launched Coveo Solutions, a separate company to focus on the enterprise market for integrated searching of local drives, intranets and the Web. In November, Copernic itself was acquired by metasearch portal Mamma.com.

  • Also in November, Autonomy announced IDOL Enterprise Desktop Search, which combines office documents, e-mail, Web sites, news and multimedia content from network, Internet and local data sources in a single query generated automatically from the context of a user's open documents.
  • Several companies, notably Blinkx, introduced technologies to automatically interpret a user's active document and offer relevant material from local and Internet sources, using an approach sometimes called "implicit query." (Several years ago, Autonomy's teaser product, Kenjin, performed a similar desktop-based service—and even suggested other Kenjin users working on similar topics.)

  • But wait, there's more.America Online using Copernic, Apple Computer upgrading Sherlock, and even Ask Jeeves picking up Tukaroo are all entering the fray with their own versions of desktop search.

  • That's on top of more current and new releases from about two dozen existing desktop specialists such as ISYS, 80-20, dtSearch and Enfish.

All of those developments should add up to great news for the individual knowledge worker.

Knowing what you know

Desktop search tools work by reading your information in advance. While standard "Find" commands have to crawl through each document in real time to spot the desired keywords, more sophisticated applications parse those documents in advance, building an index of searchable terms and other identifying information. That way, a search query can be executed on the metadata alone, yielding results in a fraction of the time.

The local index—and what an application can do with it—is the key to desktop search tools. There is a lot of variety in the types of queries that can be executed against a desktop index, from simple keywords to sophisticated linguistic analysis. Likewise, there is a wide variety in terms of how the results are ranked and displayed, including impressive graphical formats, and what you can do with those results within the context of the search application, such as forwarding a document by e-mail or copying text from the preview window to paste into a document open in another PC application.

It's the large size and low prices of today's hard disks that makes desktop search tools a.) necessary and b.) possible. To give you a personal example of how that works, my 2.5-lb laptop has a 40-GB hard drive with about 18 years of documents, research, messages and PIM data. The folders that Enfish is configured to index contain more than 80,000 items—about 6 GB of personal stuff. The index alone seems to be about 400 MB, or about 6.7% of the size of the files indexed. Other people may have significantly more material to index; other desktop search applications build significantly larger indexes. The value of a desktop search tool demonstrates one of the many direct connections between the efficiency of knowledge work at the individual level—sometimes dismissed as personal information management—and a knowledge worker's ability to effectively share tacit knowledge in communities. Those who are inefficient, unreliable or disorganized quickly become a disappointment to others. Their reputation suffers and, therefore, their social capital in the network is diminished. They will be less able to tap into the community when they need answers or other favors. On the other hand, when we are able to quickly anticipate or respond to information requests from others, we build our personal social capital. Our reputation grows in terms of being recognized for particular areas of expertise, even if we are only finding and forwarding articles we have clipped from the Web.

The descent of desktop search

There are three lessons from evolution that might easily apply to desktop search species as an emerging genus of KM tool: divergence, convergence and interdependence.

First, "adaptive radiation" means that a single species evolves into many different species to fill other available niches in the ecosystem. At the same time, "developmental conformity" says that plants and animals from different species in different parts of the world will still resemble each other if they occupy similar niches.

So these desktop search tools, developing independently in their different corners of the IT world, have all adapted to fill a particular niche in the knowledge worker's toolkit, based on the abundance of personal documents, messages and data. But now, largely thanks to Google, they are being seen for the first time as competitors for the same categorical niche. Invariably, that will spur innovation, so their feature sets will diverge and differentiate as they try to capture segments of the market.

Moreover, "coevolution," the way species adapt to each other as well as adapting to their environment, will also be at play on two levels. First, as mentioned, the competition between desktop search tools will drive rivals to innovation as each new feature is announced. And second, as desktop search becomes a serious driver in the enterprise, coevolution will emphasize the interdependence between desktop search and other KM technologies, shaping and being shaped by other tools.

Niches to fill

Initial reports treated the launch of Google Desktop Search in a vacuum, comparing GDS' index/search model only to the basic functionality of Windows' medieval Find utility. While some of the other tools making news these days are barely out of beta themselves, many companies have been making desktop search tools for years or even decades. Despite their utility, however, desktop search tools have had a hard time catching on either with individual knowledge workers or with enterprise IT departments. GDS is likely to be remembered as the signal event that caused people to take the category seriously.

Nevertheless, instead of making a convincing case for the critical utility of personal searching, many of the articles covering GDS worried about the security and privacy vulnerabilities of centralized access (in some ways, an inevitable liability with any KM system), warning that hackers could leverage desktop search tools to make viruses more effective. Indeed, Google had to issue its first security patch fairly quickly.

You can look at some of the tools that have been around for a few years now—and those that are extinct—to get an idea of how they adapt. For example, one of the earliest entries, was AltaVista Discovery, an extremely sophisticated desktop search tool that featured the ability to crawl Web sites, find similar pages, highlight, summarize or even display results in a hyperbolic tree. But it was never really marketed and despite a loyal following of users, it was dropped after AltaVista was acquired by Lycos.

Differentiation by price. Because many of the offerings will be free, vendors will be forced to justify their price tags with advanced functionality. On one hand, giving the tools away for free hasn't so far proven to be any guarantee of success for the companies that have tried it. Some companies are essentially competing with themselves as they sell one version while they (or someone else) gives another away. On the other hand, dtSearch and ISYS have stayed in the game despite rather high price tags.

Enterprise acceptance. You can expect continued resistance from many corporate IT departments, which fear a loss of control over explicit knowledge. So desktop search probably will be like instant messaging, pulled down by frustrated knowledge workers who have to break the rules and hack the system just to get their jobs done. Some companies, like 80-20 and Coveo, will devote themselves to winning IT acceptance with promises of security and central administration.

Integrated search. Just because most searches may start on the desktop doesn't mean they have to end up there. Though it doesn't offer the features of many competing products, Google Desktop Search understands something about egocentric search that other tools don't: combined results. While other desktop search companies are just figuring out that most knowledge workers don't want to search separately for e-mails, spreadsheets and word processor documents, Google, Microsoft and a few others are already offering integrated desktop, intranet and Internet searching. (It would be even better if local results could inform the wider query.)

Integration with other PKM tools. Watch for desktop search tools to self-organize into multifunction toolkits as they get built into or bundled with other personal knowledge management (PKM) tools and leverage other KM technologies. For example, Scansoft includes an index and search tool with its personal document imaging system, Paperport. Open Road Technologies' Watson will create implicit queries for Web, enterprise and desktop search tools, with an existing plug-in for X1. X1, meanwhile, also integrates with RSS aggregator and reader NewsGator, so that items from blogs and newsfeeds subscribed to by the user will show up in results.

Integration with enterprise KM tools. Companies devoted to enterprise KM are recognizing that bottom-up search is an essential part of the top-down package. Autonomy's announcement of IDOL Enterprise Desktop Search, mentioned above, is one example. Another is the Entopia K-Bus for Desktop Search, accessible as an embedded toolbar in most Microsoft Office applications or as a right-click function that allows users to automatically find locally stored information relevant to data currently displayed on the screen. The K-Bus crawling process indexes not only the content and conceptual meaning of the original file, but also takes into account the organizational and social context around it.

Indexing of non-textual material. One way desktop search companies can differentiate themselves is by specializing in specific types of explicit knowledge. Developed by a company that specializes in serving graphic artists, Creo's Six Degrees maps the relationships between messages, files and contacts, and presents them in an implicit query mode. Because graphic files don't have text content to index, Six Degrees works from the context, deriving relationships between items based on file names, folder locations, e-mail threads and contacts.

Visualization of results. As the volume of material a knowledge worker can store on a local drive continues to grow, there will be an increasing need for non-linear approaches to handling search results. Some desktop search companies are already offering clustering and categorization of results. Earlier in 2004, two companies released updated desktop search tools that emphasize visual presentation of results. Groxis' Grokker, which presents results as concentric clusters of shaded spheres, released a Google plug-in.

Supporting alternative platforms. Desktop search companies are already announcing support for non-Microsoft browsers and operating systems.

The chart on Page 13, Volume 14, Issue 2, February 2005 lists a sampling of desktop search vendors for Windows-based PCs. Compiled from reviews, company Web sites and information provided by the vendors, it presents some basic criteria for comparison. (An updated version of the chart is available at Global Insight.) The chart includes the number and release date of the current version of each company's desktop search tool, as well as some features that users are likely to value, although every knowledge worker has slightly different needs and preferences. Some tools are limited in terms of the file types they will index, such as Microsoft Office files. Others will index and retrieve additional document formats such as Adobe PDF and metadata from audio, video and image files. Many will index e-mail databases, and a few will add contacts, calendars and other PIM data. Also, a few can also handle some types of instant message history and RSS feed, for example.

Desktop search vendors

Here is a listing of desktop search vendors, which includes the company name followed by the product name and Web site:

80-20 Software

80-20 Retriever

Ask Jeeves

Ask Jeeves Desktop Search

Autonomy

IDOL Enterprise Desktop Search

Blinkx

Blinkx

Copernic

Copernic Desktop Search

Creo

Six Degrees

dtSearch

dtSearch Desktop

Entopia

K-Bus for Desktop Search

Enfish

Enfish Find, Enfish Professional

Faico

AFSearch

Filehand

Filehand Search

Google

Google Desktop Search

Groxis

Grokker My Files plug-in

ISYS

ISYS:desktop

Lycos

Hotbot Desktop Search

Management Information Technologies

Readware ConSearch

META

diskMETA

Microsoft

MSN Toolbar Suite

Redtree Development

Wilbur

ScanSoft

PaperPort All-in-One-Search

SER

SERglobalBrain Personal Edition

Wizetech Software

Archivarius 3000

X1

X1

x-dot

x-friend

Yadu Digital

Finders Keepers

Pay attention to the minimum recommended processor power and RAM specifications. The indexing phase is one of the most power-hungry tasks your personal computer will ever be asked to perform. Depending on the application and the size of your personal repositories, the initial indexing can take anywhere from a few hours to a few days. Fortunately, that is only done upfront, and the best tools will update the index only when the computer is idle.

Some desktop search tools are somewhat limited in terms of the file types they will index, such as Microsoft Office files. Others will also index and retrieve additional document formats such as Adobe PDF and metadata from audio, video and image files. Many will index e-mail databases, and a few will add contacts, calendars and other PIM data. A few can also handle some types of instant message history and RSS feed. Sometimes you can specify a file type important to you and it will index any text available, such as embedded text in a voice recognition session.


Steve Barth (global-insight.com) writes, teaches and consults on personal knowledge management and knowledge worker productivity. In the interest of disclosure, some of his consulting engagements cover the issue of desktop search.


Search KMWorld

Connect

Buyers' Guide
Learn More in the Buyers' Guide!