Three amazing reshapes for 2013: A new spin on classic search
Big data could emerge as the go-to buzzword in 2013. Dozens of companies are jockeying for mindshare in what appears to be the next digital gold rush. I have analyzed a number of companies in this sector. What strikes me as interesting is that companies like Cybertap, Digital Reasoning, IKANOW, Rosoka, Visual Analytics and others include a search function with their systems. However, the technological plumbing is built to handle large flows of data and information. Search functionality, when present, is a subordinate function.
Search versus big data
Can companies that are based on traditional information retrieval foundations compete against specialist firms with purpose-built big data systems? We know that HP believed that Autonomy's technology, some of which dates from the mid-1990s, could pivot. Vivisimo remains a leader in on-the-fly clustering and deduplicating. The big data assertion strikes me as somewhat out of bounds. The Coveo positioning is, in my opinion, essentially a marketing play. The company has been in business as a vendor of enterprise search for almost a decade. The new positioning may require some acrobatics to achieve.
Not long ago I delivered a two-day seminar to individuals interested in big data. I provided a list of about 50 companies that, based on my research, are vendors of note in the big data business. I did not list any enterprise search vendor. When asked why I hadn't included any on my list, I tersely replied, "Search and retrieval is not the core of systems designed to handle such big data processing chores as discerning meaning in Twitter messages, making sense of terabytes of e-mail and text messaging traffic, or figuring out what the words mean when attached as metadata to mobile phone tracking data."
Enterprise search systems focus on processing some well-known file types. When an organization wants to locate via key words information on a known topic, banging words in a search box works. Big data, by definition, may contain information about which the user has no prior knowledge. Traditional methods of indexing can add metadata, but different processing architectures are required to handle large data flows without delay, identify potentially interesting items and provide different types of user interface functions to a user. Examples range from relationship maps that show how items or entities connect or graphic interfaces that identify anomalous data or information.
Can a traditional search vendor play in the trend-setting big data world? The answer is, "Maybe." As George Bernard Shaw observed: "The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man."
How long will Hewlett Packard remain reasonable? How much patience will IBM have with Vivisimo's technology? How long will the firms providing more than $30 million to Coveo wait to get a return? For organizations looking for new tools to integrate big data into their knowledge management systems, reasonableness may be a scarce commodity in today's uncertain financial environment.
Three differences between information retrieval and big data systems
In a recent seminar I presented, attendees from different government units circled back to the differences between an information retrieval system built for common office file types and a system that must cope with fact, text and compound objects. I identified three differences:
- A next-generation system designed to process big data does not use exclusively the methods found in mainstream search systems. Numerical recipes that use multiple algorithmic methods are part of the foundation of a big data system. Most vendors pivoting into big data add on or "wrap" the traditional information retrieval approach with new functions.
- Big data demands different computing architectures. The cloud is only part of the technical toolbox. The systems and methods to acquire, process and analyze the massive data flows force next-generation companies to move into uncomfortable, often less familiar engineering approaches.
- The interfaces for most big data tools fall into two categories: the graphically rich user interface and the unyielding command line. Enterprise search systems depend on familiar conventions such as a search box and a list of hyperlinks. Useful but tired in my opinion.