The evolving federation of search
Search is one of those technologies customers take for granted. After all, search and retrieval was the first application to debut in the nascent document management industry of the early ’90s. Text was indexed manually and automatically and could be searched within the search and retrieval application. Then search and retrieval became a component of more complex applications like imaging and workflow, and later knowledge management and portals. With the emergence of Google, however, search became a dominant standalone app again. But as much cachet as that powerful product has, it’s not the only game in town.
Google may own the Web, but other products like it have staked out federated search and other disciplines as their own dominion. (Federated search gives users the ability to search repositories of different content management vendors within the enterprise by means of adapters.) All have their distinct strengths, but many have specialized in target applications and vertical markets, and can find files as different as audio, video and text.
But given that profligate growth of multipurpose search products, what market forces are driving vendors to keep evolving more unique focus and features? The short answer is the phenomenal proliferation of information across the global enterprise; the disparate locations of that data—common applications like SharePoint, IBM Lotus Notes, SAP, as well as apps unique to the organization like shared and network drives; and the types of data—structured and unstructured. But that said, what’s key to the new techniques is that their applications and vertical markets are also almost limitless and don’t require the user to change legacy applications and formats to get at legacy information. Adapters to different apps solve that problem.
So search has made great strides forward in the last decade. But given the nature of technology change and the explosion of information, look for substantive, alternative methods of search—as well as new and increasingly numerous applications—to keep pace with the rapidly changing world of information.
Some factors driving searchAccording to Bill Chambers, VP of consulting, at Doculabs, a critical application driving adoption of complex search products is compliance, as well as the need for e-discovery strategies. The federal rules of civil procedure require that organizations be able to search across a number of disparate repositories to look for documents that are requested in legal discovery efforts. When an organization is sued, it often must produce documents as different as Microsoft Word and Excel, video, audio, images and other document types.
Complicating the issue is the fact that finding those documents is difficult. They could be in any number of places outside the typical content management applications like SharePoint or shared drives. So, says Chambers, to do a comprehensive federated search that retrieves a consolidated set of information to be used for discovery requires a powerful, multipurpose search product like those from Autonomy, Google and FAST, (a Microsoft subsidiary). Those tasks are beyond the scope of search engines typically located in popular content management systems or even other common applications found in many enterprises like SAP ERP, SharePoint portal or Lotus Notes groupware.
Also, with the maturation of content management technologies, Chambers says he is seeing organizations in shared service environments implement enterprisewide instead of just departmental applications. That calls for the ability to search across disparate repositories that different departments have populated with various types of documents. Users also have a knack for asking IT for documents outside their departmental enterprise content management (ECM) repository that have relevance to the business process with which they’re involved. Most ECM vendors, says Chambers, offer some federated search capability, so they can search to a certain extent across disparate repositories in addition to their own. But federated search is still immature and limited in functionality scope, although customers are not as gun-shy about deploying it as they were even a year ago.
New search architectures
Given the fact that relevant data is dispersed all over the enterprise, vendors have been trying to address the problem with new search architectures. According to Sue Feldman, research VP, search and digital marketplace technologies, at IDC, "Search architectures are not databases—that’s a very important thing to understand. The second thing you need to understand is that they are exquisitely appropriate for putting all kinds of information into a single bucket called an ‘inverted index.’" An inverted index is an index data structure that stores mapping from content, such as words or numbers, to its locations in a database file, or in a document or a set of documents, so that content can be searched—most commonly in document retrieval systems.
With that method, users don’t have to sort data searches ahead of time, explains Feldman. "All the user has to do," she says, "is extract as many meaningful pieces out of unstructured or structured information [as possible] and put them into the same architecture so that the user can match them against queries of different kinds. And those could be structured or unstructured queries that they’re able to do simultaneously … So, a single architecture lets the user start to tap both the content and the database information that’s in the organization."
Companies conspicuous in that approach, says Feldman, include Attivio, Exalead, FAST, Endeca and others.
"If you understand you can get at this information without having to sort it or structure it in any way ahead of time—although you certainly can—and that it doesn’t require that you have the same structure across multiple information sources—that’s really the beauty of these technologies," Feldman says. "It’s not as if you have to take the information out of the silo; it’s not as if you have to change your business intelligence or content management system or your ERP system. This is a layer that sits on top of all of those and sends appropriate queries to all of them, brings back the information, matches it, ranks it and presents it."
Feldman offers one caveat, however: "The companies mentioned aren’t the only ones that can solve these problems in the search world—you could also use a regular search engine that can pull the information in but just can’t do as many of the operations within the search architecture. So, you certainly can see any of the companies that are enterprise search today being able to get at both the data and content, just not as elegantly."
The hierarchy of search complexityChambers further differentiates types of search from relatively simple to complex. At the simple end of the continuum, he says, "you’re doing primarily field search and a combination of different types of indexes or values that you’re searching for." Adequate for that type of searching would be a product like Verity (bought by Autonomy), which you’d find in other content management products like FileNet (now part of IBM).
"It searches across one repository, doesn’t do any particularly sophisticated searching, will do some relevance ranking but not in a comprehensive way," explains Chambers. "This type is beginning to go away now because people are demanding more functionality." It might be appropriate, though, for a SAP user who’s in a financial application and is searching for information in a single repository in SAP.
More complex search, like Google, offers relevancy ranking, can bring up topics and can combine a lot of search attributes into one main search, according to Chambers. "You can also search both across your desktop and across the Web at the same time," he says. That would be appropriate for an analyst, for example, who is doing competitive research and using SAP, but who is also researching on the Web, and has created documents concerning that research.