-->

KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

Searching the Long Tail

The "long tail" theory says there's latent demand for each piece of information you create—not just the most popular ones. And with the reach of corporate intranets, portals and the Internet, it's now possible to satisfy the long tail of demand out to the very end. But it takes more than simple search to help a person find the content that matters to him and no one else. So how can anybody find anything in there?

Mass Market Goes Niche
The long tail was first described by Chris Anderson, editor-in-chief of Wired.1 The simplest way to illustrate the long tail is with a retail example, but as we'll discuss below, the long tail equally well describes search in intranets, portals, knowledge bases and nearly any other place that we aggregate information. Anderson points out that the economics of the Internet breaks offline barriers and turns scarcity-based business decisions upside down. Online, it costs practically nothing to add another product to inventory. It consumes no additional shelf space because online shelf space is infinite. It doesn't have to be held in on-hand inventory because it can be sourced from a supplier at the time of the order. Similarly, serving one more customer costs practically nothing. Even if taking on 10,000 additional customers incurs additional costs in hardware and bandwidth, this pales in comparison to the outlay for construction, staffing and supply infrastructure for new buildings. And the potential benefits are enormous. Anderson found that representative online retailers in books, film and music derive 20% to 25% of their revenue from sales in the long tail.2

Not Just Long Re-tail
The long tail is simply a power law—the distribution behind the famous 80/20 rule. Power laws crop up all over. Take words in the English language: a few get used all the time (e.g., the, of, and), some occasionally (e.g., today, mother, probably) and an enormous number almost never (e.g., conquistador). This is a power law. Page visits on a Web site also form a power law.3 In fact, every search or navigation log we've looked at shows a power law—whether for internet, intranet, extranet sites or even analytics apps, regardless of industry. (See figure.) The common elements in all these cases are freedom of choice combined with a large number of options. Inevitably, inequality arises as more people, influenced by the choices already made by others, converge on a few selections. But the convergence is never complete. Each individual has goals that most others don't. A few customers want to read James Joyce's one play. A handful of employees need the dental plan exemption form. And only one manager needs to know the performance of the last direct-mail campaign in Nevada. Historically, companies couldn't support all these one-off goals. They had to cut the long tail short. Now, they don't.

More Needles, Bigger Haystack
But, of course, there's a price. Capturing the long-tail opportunity means offering an abundance of information, which becomes overwhelming when a person can't find what he's looking for. Searching—the human activity of looking for something, whether with keyword search or other means—unfolds in a scenario of small but difficult choices. The sheer number of scenarios in the long tail makes optimizing each one impossible. To provide the best experience for each user, focus on the four principles that drive all search scenarios. When people search, they:

Predict: With each step, the user tries to predict the likely effects of his action. But the inherent uncertainty of seeking something not yet found makes these predictions difficult. For example, to find the dental plan exemption form, should an employee click "Dental Plan," which could have all the plan information, or "Forms," which could have all forms, regardless of type? Or should he put in a keyword search? And, if so, what if he calls it an "exemption form" and they call it a "benefits declination" form? If he makes the wrong prediction, he's wasting time.
Adapt: As people learn something more about the thing they're looking for, the functionality of the system they're using or the content in it, they modify their approach.4 If a user were at an online job-posting site looking for a direct marketing position near her home town, she might start with a search for "direct marketing." When she looks at the thousands of results, she sees an option to refine the list to just her town. She clicks it. These are two completely different methods of searching. But people see them as perfectly complementary.
Iterate: The notion that people won't click more than three times is one of the great myths of Web site design. Research from User Interface Engineering dispels this myth by showing that users will take as many as 25 steps to reach their goals.5 The quantity of clicks doesn't matter as much as the quality. As long as people make progress with each step, they feel that their efforts are well spent.6 But if they don't see consistent progress, they're likely to give up.
Revise: Serendipity will take its course. People are likely to learn something completely new along the way that causes them to revise their goals. When this happens, they form new goals and repeat the searching process.

Better Tools Unlock Searching Skill
Human searching behavior is not a problem; it's an untapped resource. But search tools to date often restrict these talents instead of enhancing them. To manage choice in the long tail, people need a solution where:
1. Discrete attributes aid prediction. Compared with keyword search, navigation is a sure bet. But, as we've seen, ambiguous categories still pose a problem. The answer is navigation based on attributes, or facets. Refining a list of items by their attributes allows users to accurately predict that the items on the next page will share the selected characteristics. Two things happen as a result: (a.) A user creates an ad hoc category that's perfect for him, which couldn't have been predicted by a taxonomist; and (b.) The user sees his predictions repeatedly come true, making him feel that he understands how to use the tools at his disposal. This sense of control is the greatest indicator of likely success in a searching scenario.7
2. Integrated search and navigation support adaptation. Neither navigation nor search alone is enough. One or the other may be the better place to start, depending on the user and her goal. Once she takes that first step, her second could require searching within the results or refining them by an attribute, as could the third step, and so on. Searching at a news site for "interest rates" and refining the resulting articles by date is one thing. Refining them by an attribute the user never would have thought to ask for, like "effect on bond markets" is another thing entirely. Exposing the attributes of the search results next to the listings themselves lets users modify their search strategies in ways they couldn't have planned.
3. Consistent context encourages iteration. Moving back and forth between search and navigation is valuable only if each step builds on the last and each available option is valid given the choices already made. For example, that job-posting site should only invite our job-seeker to narrow the list to her home town if direct marketing positions actually exist there. In short, users should be able to refine all their searches by navigating, all their navigation by searching, without ever hitting a dead end.
4. Spotlighting drives users further down the long tail. Better search tools provide the support people need to find what they're looking for. But even then they can't find what they don't know they want. Anderson points out that recommendations are a necessity in the long tail world. Site owners should be able to throw a spotlight on items they want to feature, using the context of the user's past choices to fine-tune the promotion.

The counterintuitive reality of the long tail is that its potential is based on aggregating supply and demand, but its realization is based on helping individuals find just the right thing, one scenario at a time.


Endeca (www.endeca.com), headquartered in Cambridge, MA, was founded in 1999 to transform the online search and navigation experience so that people can easily access the full breadth and depth of large data sets. Today, Endeca solutions for enterprise search and commerce are already helping businesses across a variety of sectors including financial services, manufacturing, retail, information providers and business-to-business with applications that address the information overload problems associated with enterprise information access and retrieval and content and catalog management.

1 "The Long Tail," Wired, Issue 12/10, 2004.
2 From the revised version of Anderson's article, which can be found at changethis.com (http:// www.changethis.com/10.LongTail).
3 Jakob Nielsen demonstrated this in 1997 (http:// www.useit.com/alertbox/9704b.HTML).
4 N. Belkin, "Interaction with texts: Information retrieval as information-seeking behavior," 1993.
5 J. Porter, "Testing the Three-Click Rule," 2003.
6 P. Pirolli, S. Card, "Information Foraging Theory," 1999.
7 N. Belkin, "An overview of results from Rutgers' investigations of interactive information retrieval," 1998.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues