Taming knowledge with open source

This article appears in the issue February 2012 [Volume 21, Issue 2]
Before the holidays, I had a chance to speak with Raphael Perez, one of the founders of OpenSearchServer. Based in Paris, OpenSearchServer originated as a project for a French company that needed business-to-business software. After looking at the commercial products, the firm's technical team decided to create an in-house solution as an open source project. Like IBM and SearchBlox, OpenSearchServer used Lucene as a foundation.

Lucene is a free search system available from the Apache Foundation. You can snag a copy of the code yourself at With some poking around or, you will quickly discover that a number of organizations are offering Lucene-centric search solutions. When you expand your query to "open source search software," you will find a dozen or more solutions. Some have the backing of large investment firms like Lucid Imagination, and others are baffling like Egothor and Oxyus.

Open source software in the enterprise is gaining traction. But the data I have reviewed is often misleading. For example, one major commercial open source vendor provided me with a list of its customers. My job was to contact those firms, many of which were high profile, and find out how open source search was being used. We contacted the companies, worked through the voicemail barbed wire and avoided e-mail land mines. We spoke with more than 40 companies, and we learned three things.

First, the senior managers whom we were able to reach did not know exactly how open source was being used. The open source work was handled by engineers working on specific projects.

Second, we discovered that the senior manager delegated the open source work to engineers already familiar with the technology. The mechanics of downloading, installing and integrating the technology was a nuts-and-bolts function. One of the people with whom we spoke made this point: "The open source software is working. I would hate to lose the two people working on this project. Some of the tweaking those guys are doing is over my head."

We formed the hypothesis that young engineers, often fresh from the university, were among the pool of "champions." At a major open source conference, I was able to speak with one of the engineers working on the use of open source at LinkedIn, the professionally oriented social network company. My conversation took place before LinkedIn's initial public offering, and the company may have shifted its technology focus. The comment I recorded from the engineer familiar with the LinkedIn search system was: "Most of my team consists of pretty young engineers. Our use of Lucene was a logical one. We were familiar with the system, and we were confident we could set it up and write the code required to meet our needs."

No handcuff

Third, several characteristics of open source software surfaced as touch points; for example, open source software did not incur license fees. But the one benefit that I found particularly significant was that changes made to the open source software did not require going back to a vendor and getting "permission" to fiddle with the code. One engineer with whom I spoke told me, "I think commercial software puts handcuffs on people like me. I just want to make the changes to get the commercial software to solve my problem. Open source does not come with license agreements, which may force me to use the commercial firm's engineers to make a trivial change."

OpenSearchServer, therefore, was of interest to me. The company offers a solution that is open source, focused on search and wrapped in a commercial endeavor. OpenSearchServer works with a company called Jaeksoft that handles the development and distribution of the product.

Raphael Perez worked on the original project with Emmanuel Keller at Infopro, and is now the chief executive officer of OpenSearchServer, which, he said, is an open source enterprise search tool that allows users to develop search-based applications using index and full-text algorithms. The firm delivers support and services for OpenSearchServer developers as well as customized developments.

I was skeptical about the robustness of open source software. Perez said, "Clients want to index what's on their servers and content accessible via the Internet. Access to content from different sources and systems is a key point. I can tell you that we are very active in this direction. We are partnering with the Manifold Connectors Framework project, to which we are a contributor, and we already have some customers doing beta testing on an OpenSearchServer able to access data within FileNet (IBM), SharePoint (Microsoft), Meridio (Autonomy) or OpenText Livelink."

Perez continued, "Also, Open Search Server has a powerful crawler module to index content from the Web (Internet and intranet), files systems and databases. This new method is really a strong trend, we believe. For many needs, search platforms bring a great complement to applications and open a completely new sky to users. This is a good example of what users want and what CIOs want to do for their companies: software that goes beyond users' expectations and helps them to focus on their job. Productivity improves with this approach."

OpenSearchServer is not watching innovation rush past the company. Perez said, "I believe the search platforms are going to reach a new level. No longer will the problem be to answer a query launched by a user. Services are coming that provide users with a lot more than a correct answer. A key direction for me is how search systems will help navigate and narrow searches within answers; another direction is to have a collaborative view on the search. The search tools will learn from a query to better answer the next query and will adapt their ‘memory' to the user profile. On another hand, big data experience and index size operations will be highly improved and will be more and more paired and synchronized with large database deployments. And as a last but not least, we will adapt our platform to match expectations of today's database developers who will be on search platforms tomorrow."

A big step

Consider the role of open source technology in knowledge management applications. What is difficult to ignore is the fact that Microsoft has taken an important step toward open source. The company has shifted resources from its Dryad data management project to support Apache's open source Hadoop ( data management platform. To put that in perspective, IBM uses Lucene in its search and retrieval systems and has tucked it inside Watson, IBM's next-generation smart search system. One wonders if Microsoft will discontinue support for the FAST Search information retrieval system, eliminating the expense of maintaining software that is delivering what amounts to commoditized functionality.

