-->

KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

Keeping an Open Mind with Intelligent Search

With the accelerating rates of change in content and data, we just don't know the size or shape of what we'll have to search next month. Will there be another social media channel for us to monitor? A new competitor whose website we want to keep an eye on? Or a whole new set of products we have to release to keep up with competition? In any case, we need a search strategy that is adaptable, scalable, economical and "intelligent."

To be clear, we don't mean the "intelligence" that some commercial search engine companies claim their software exhibits inherently. Rather, it's that applying search in an intelligent fashion can bring real, palpable benefits.

More and more often, the critical enabler for development of intelligent search applications is open source search. Technologies such as Apache Solr/Lucene have helped transform "enterprise" search from a weak corporate analog of the consumer Internet (as in "find the lunch menu in the corporate cafeteria") into a critical competitive weapon.

Open source search is ready for prime time-many of the world's largest organizations use it to tackle search problems central to their business. Zappos, the shoe and clothing retailer, delivers its search using Solr. Solr is also being used in the publishing sector to provide entirely new ways of delivering news content; the Guardian in the UK now allows its content to be freely accessed using the Solr-powered open platform API, driving the emerging field of "data journalism." Solr/Lucene is also in production deployment at big companies as diverse as Yelp, LinkedIn, HP, IBM, Raytheon and Verizon in a broad, diverse range of use cases.

The popularity of open source search, however, may have more to do with the nature of search than the particulars of open source licenses.

The Value of Open Source

Historically, search has been thought of as something one "adds" to existing data sources, bolted in between users and various repositories to help "find stuff." Where data scope was stable and user expectations were predictable—now rarely the case—this might have worked.

Ironically, promoters of this "bolt-on" approach often deride open source as no more than a "toolkit." As learning and "intelligence" go hand-in-hand, the toolkit approach of open source search allows you to quickly prototype a search application that you can expose to users and begin to learn how they use the results it gives them, feeding these same results back to improve search performance.

Search is one of those cases in which quality and perfection are often at odds. For example, if you need an application that shows the best-performing salespeople in your company, you can use Solr to search for positive comments from customers and show this combined with sales figures. Code this in Ruby in a few days and it's on your intranet next week.

Once you begin to see how users choose results—what queries are popular, what data sources are most heavily relied upon—then, with the same speed and facility, you can enhance the structure and organization of the application and the data to boost some documents over others, restructure the data, etc. Agility is not just for the first prototype, but a continuous virtue of intelligent search. As open source offers transparent relevancy mechanisms by definition, it is readily more adaptable to new variations in data.

Along with the ability to deliver useful results quickly, Solr/Lucene can offer tremendous scalability and support growth. The Hathi Trust digital library collection uses Solr to search 8 million volumes containing about 1.2 billion pages, with about 1.7 trillion words in a 4.5 terabyte index. Twitter runs 1 billion queries per day with Lucene. Such headroom in the infrastructure provides the reassurance most organizations need in committing to the technology.

Enterprise-grade support is another essential enabler; the emergence of commercially supported distributions (such as LucidWorks Enterprise) provides a vital measure of risk management. Its suitability for flexibly available cloud implementation bolsters this assurance further. The business model in open source search technology also helps unlock scalability, with growth decoupled from per document license fees.

Open Source Flexibility

What makes "big" into "intelligent" is the ability to enrich the data and tackle much more complex queries than are possible with conventional data stores. For example, by creating an index that spans different databases and content repositories, each with its own schema and data layout, search applications built with Solr/Lucene can establish a universal map of data resources that can handle complex queries. A phrase query with multiple terms can retrieve a relevant, precise set of documents from across multiple repositories. Users are not burdened with programming complex queries, each constrained by the particulars of various data stores. In effect, search is made smarter by flattening query complexity.

The open-minded "toolkit" mentality inherent to open source Solr/Lucene turns out to be a great fit for solving the problems of continuously changing data and user expectations, allowing for a wide variety of different techniques to be applied. Modern search engine technology is more adaptable and better performing in many cases than traditional databases.

Owing to the low cost of growing and distributing indexes, Solr/Lucene also make it easier to enrich searches with more resilient data. For example: rather than just index street names, you can add latitude/longitude data as index entries, and write function queries that translate that location information into a search result. Yelp.com, the local advertising and ratings site, takes just such an approach.

We've seen how open source search technology built on Solr/Lucene is best suited for the size and shape of the data and content your organization can take advantage of.

But the most interesting use cases are yet to come, simply because none of us can predict the future. With a flexible, adaptable and intelligent search strategy, we can confidently future-proof ourselves and our business against tomorrow's floods of new data.  

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues