-->

Will SEO manage information access?

Article Featured Image

The phrase search engine optimization (SEO) refers to practices that make information findable. In this era of big data, SEO, not traditional search, may be the key to delivering information to employees. Relevance measures like precision and recall are irrelevant in the Web search world. Popularity, advertising purchases and search engine optimization may be the new yardsticks of precision and recall. Key "rental car" into the Google search box and the results page displays about a dozen ads with the top listing pointing to Enterprise Rent-A-Car or a service like Kayak.com.

The notion of "search" suggests that a person wants information. After 50 years of effort, search requires a user to know what words or phrases to enter into a search box. If the user crafts the "right" query, the search system delivers the needed information. If the information is not in the search system's index, the user can continue a trial-and-error process until the information strongbox is unlocked.

SEO is different. SEO means putting certain information in front of a user one way or another. I learned about an early interest in search engine optimization for "objective" search more than a decade ago. The details are fuzzy; the story I pieced together may be apocryphal, yet suggestive.

Rising to the top of the results list

In the early days of a new administration in 2001, the Fast Search & Transfer SA system fueled the U.S. government's public search service. At that time, a query for "White House" would return results that displayed content about the Office of the President at the top of a results list. Other occurrences of "White House" from government websites were deeper in the results list. The vice president's website was in the "White House" results list, but it was not on the first page of hits. I recall learning that the website for the new vice president had to be boosted. The vice president wanted his office's Web page to appear on certain search results pages at or near the top of the list.

The version of Fast Search in use did not include an administrative control to "force" a particular Web page to appear at a certain place in the results list. The Fast Search system offered a bewildering array of controls to tweak the relevance ranking system. Fiddling with those controls could generate some unexpected relevance issues. To respond to the request to boost the vice president's Web page, we hacked a workaround. When a user entered the query "White House," results from the vice president's Web page were displayed on the first page of the results list.

The urgency and importance of making content appear high in a results list was a priority. The difference between my team's clever workaround and the problem of content not appearing in a results list is significant. The vice president of the United States had content. The fact that the content was not appearing in a results list was a problem resolved with methods widely used by script kiddies. The problem of a customer support person not finding an answer to a question strikes at the core of information retrieval, findability and knowledge management.

Long way to go

If you are a member of LinkedIn, you may have followed a thread labeled "How to manage queries having no relevant answers but still matching some terms." The information in the series of posts contained some useful marketing and semi-technical points. The appeal of the thread was that it makes clear how much work remains to be done to provide employees with information needed to complete work.

On the surface, the problem is one many organizations have: A person working in customer support enters terms into a search box. The results list does not contain information that answers the user's question. The Linked-In thread wandered from Boolean logic to overt sales pitches for a better system. A technical lead at Sematext (sematext.com) provided a link to the person struggling with the problem of displaying "hits" that do not answer the user query. Other members of the group tossed in buzzwords like "facets," "floors," "predictive techniques" and "quorum-level ranking." After several thousand words from dozens of search experts, a programmer at Thunderstone (thunderstone.com) wrote: "Just seems that the thread is wandering far afield from the original problem."

My thought was that the "original problem" is a bit like "original sin." There is no fix. If the needed information is not in the index, no search system with which I am familiar can provide an answer.

Searching for the answer

The disconnect for those trying to locate information within the context of work is that knowledge is difficult to access. Information is often hard to find. Verification of data answering a user's question is essentially a task little changed despite the proliferation of technology.

If the "answer" is not in the system's index, how can the system display what the user requires? Should a search engine administrator or the senior executive responsible for knowledge management "just put" the information in the system so that for a person in customer support, the system will display the desired answer? On one hand, placed information is what marketing and other business professionals do. On the other hand, if the search system does not find the information, should the user kick into a lower gear and find the answer? Another option is to build a system so that a customer's query can pass to the larger world of people outside the company. In effect, the search system crowd-sources the answer.

In each of those scenarios, issues about quality, response time, the "rightness" of the "answer" and responsibility for the information bubble to the surface. Search and knowledge management vendors talk around those issues. Many procurement teams and engineers lack the background, authority and experience to figure out how "findability" is implemented in an organization. Raising search questions to senior management often triggers glassy eyes and a 1,000-yard stare.

Longtime problem

Most modern systems cannot reliably "answer" questions the way a human does—at least not yet. Software struggles with hunches, intuition and non-linear reasoning. "Smart" systems perform when the information is in an index or a database. IBM Watson required a database of facts to defeat humans on the Jeopardy game show. HP's Autonomy system must process information before its neurolinguistic engine can "understand" the meaning of a document and, hence, answer a user's query. Even Google's predictive search technology requires data about user behavior, content known to the Google system and inputs that pull cues from geographic coordinates and factors known only to Google. Brute force works for finding pizza, but it may crush the needed nuance when looking for business insight about a competitor.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues