-->

KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

People Judge Relevance. Machines Calculate Evidence

If we are to make humans the determinant of relevance in an information access application, it is context that helps them judge information. But how do we get from content to context?

Relationships in the Content Inform Context in the Head

In the man/machine exchange of information access, only one party is reasoning with the flexibility and inventiveness of a human being. To make the machine play a helpful role in the dialogue, there are two basic approaches:

1. Mimic human judgment with software rules
2. Use software rules to inform human judgment

The first works well in constrained, static user scenarios. For example, a digital camera sensing low light will leave the "shutter" open longer, just as a photographer would. But there's no setting for capturing, say, a child's joy of accomplishment. And if there were such a setting, would you trust it on the day of your child's big game? Search engines attempting to calculate relevance are overreaching just as much. These algorithms are really just counting—whether it's the number of times the query term appears in the target documents, the number of links using that term which point to a given Web page, or something based on probabilistic or other statistical calculations. In each case, the raw calculations fail to capture what the indexed information is about. Worse, the mechanism is a black box. When the user gets results that appear irrelevant to him, it's hard to tell why his best-attempt query failed.

The second approach works well in dynamic, unpredictable scenarios. Think of a doctor in an emergency room. A patient arrives complaining of weakness, shortness of breath and numbness in his extremities. A blood test shows low oxygen saturation and a chest x-ray shows fluid in the lungs. Neither of the machines that produce this data can draw the conclusion that the patient was poisoned. But the doctor can. The tools provide the doctor with facts that better inform her judgment. But it's the doctor who puts the whole picture together. In information-access technologies, databases and BI tools are much closer to this approach. They store data in specific structures, allowing them to maintain the factual relationships inside. The key is to point the machine at the smallest indivisible unit of information and then give the user the tools to see those pieces as a whole picture.

That unit is the facet—an explicit characteristic of a thing. Facets are everywhere—metadata on documents, user-contributed tags and fields in a database record are all facets. And in a strange twist, the full-text index of a document can be thought of as just a dynamic facet of that file.

But don't relational databases work this way today? Not quite. People easily conceive of dimensions that cut across a "rectangular" database. Take a customer database; it's straightforward to find out which customers bought a particular product in the past year. But it's much more difficult to find out which combinations of products were bought in the last year by customers in a particular region—a sensible question, based on dimensions that cut across the original database structure. That's what BI is for. Unfortunately, BI can only support the questions the developers knew ahead of time would be asked. But people are much more flexible than that. They come up with unforeseen questions and unpredictable ways to express otherwise mundane requests. And, they want to take into account information in different forms from records to documents. That's what search is for. But, of course, traditional search engines don't maintain relationships from the systems they index. And we're back where we started.

A New Architecture Calculates Relationships Among Relationships

Just as modern planes, medical technologies and cameras give pilots, doctors and photographers more flexibility and control, a modern information-access platform must do the same for employees, customers and partners. Such a machine must:

  • Accept messy, real-world enterprise information. The platform must be able to index enterprise data and content that comes in all its different formats, sizes and quality.
  • Preserve all the relationships in the original systems. The relationships in the databases, enterprise apps and documents are pre-existing investments the platform must exploit.
  • Calculate all the dimensional relationships. The facets in each record and document are the basis for connections across otherwise completely different assets.
  • Guide users through constantly shifting contexts. Each time the user takes a step through the app—whether a keyword query, a navigation selection, a circled region on a map—the platform must show him the results and all the possible, but only the valid next steps.

These requirements call for an architectural capability called adaptivity. Adaptivity is the dynamic calculation of relationships among relationships in the current results set based on the possibilities in the data, the user's actions and any business rules. Information access applications built on such a platform show a user all the results to a query plus all the information about those results, creating greater context for the user's judgments. For example, a product engineer looking for titanium bolts for a lightweight chassis design searches for "titanium bolts." The results include all the titanium bolts, of course, as well as all the dimensions by which this list of bolts could be refined—length, weight, finish, thread pitch and so on—plus the quality-assurance reports including data on titanium bolts, with the appropriate refinement dimensions. And as the engineer selects refinements or searches within the results, or both, all the information on the screen dynamically updates, driven entirely on the possibilities that exist in the data. The result is the richest possible presentation of the evidence, guiding the engineer to his goal.

Machines should not pretend to do things they can't. And they can't judge relevance for a particular user with a specific goal. Instead, the machine should calculate the relationships among relationships in the data and content, fueling the context for decisions, making experts of us all.


Endeca (www.endeca.com), headquartered in Cambridge, MA is a next-generation information access company uniting the ease of search with the analytical power of business intelligence. The Endeca Information Access Platform combines patented intellectual property, breakthrough science and a deep focus on user experience to help people find, analyze and understand information in ways never before possible. Leading global organizations like ABN AMRO, Boeing and Cox Newspapers rely on Endeca to increase revenue, reduce costs and streamline operations through better information access..

Special Advertising Section

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues