-->

KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

Dumbing Down Enterprise Search To Make It More Powerful

Are we over-thinking the concepts of enterprise search—over-engineering what it’s about and how it works? Isn’t it about finding the most relevant documents or pieces of information quickly, so that you can focus on getting your job done? The promise of enterprise search and "knowledge" management has always been an increase in knowledge worker productivity—enabling your employees to focus on making business decisions and not duplicating efforts already put forth by someone else.

Sounds straightforward—and yet so many companies get wrapped around the axle over text analytics, specialized vocabularies, taxonomies and ontologies. Can we possibly frustrate our knowledge workers even more than they already are? They can search the Internet using Google all day and find out critical information about the competition, your salary, what the vice president had for lunch... and yet they can’t find the proposal response they labored over and perfected for a specific question, the contract language for a particular clause, or the presentation their former co-worker gave at a conference last year.

This is not to say text analytics, taxonomies and the like don’t have a place in the world of search. They do. But as everyone is tasked with doing more with less, speed of information access is more important than ever. Our businesses are changing rapidly as we downsize, right-size, merge and acquire, and the expectation is that as more and more key workers are taken out of the enterprise, those who are left are expected to jump right in and take over. At the end of the day it should not take a rocket scientist with an advanced degree in mathematics to find a .PPT or a legal brief.

That’s where the associative access approach to search can be quite powerful. It’s simple enough to run searches on your personal laptop and yet powerful enough to execute queries against disparate, multi-terabyte repositories of data located around the globe in a matter of seconds.

It’s About Math
The associative access, or "n-gram," approach to search takes out the human element, simply turning words into mathematical representations. For example, when tri-grams (three-letter components) are used to index text, words are parsed into three-letter parts from which a hyper-dimensional vector is created. For example, the word "sample" would be parsed as "sam," "amp," "mpl," etc. Associative access then breaks up the search query in the same manner, and performs comparisons of its three-letter snippets to those in the index using a matrix of ones and zeros. Finally, it rates the possible hits based on matches to those three-letter groups.

These solutions may also be referred to as "fuzzy search" or "fuzzy logic" because they are not a rifle shot to an exact match but instead deliver results that allow users to choose which ones are most relevant to them. In the end, associative search technologies don’t try to derive meaning, and they don’t care who, what or why you are searching... they just help you find what you are looking for as quickly as possible.

This type of search is extremely powerful at handling misspellings, uncommon terms and very large amounts of text in a search query. Associative searches are lightning fast. In the case of the tri-gram, the knowledgebase you are searching against has already been indexed into a matrix, so your query string simply needs to be turned into ones and zeroes and quickly matched against the existing data.

Balancing Precision and Recall
Because associative access search determines both how close your request is to the indexed data and how far away it is, you have the option of refining the relevance of what gets returned. If you are looking for as near a match to the query you entered, keep the relevancy high, maybe 90%. If you need to find as much information as possible about the subject matter, lower the relevancy so you can also return similar if not exact matches (as in the case of e-discovery, for example).

Unlike traditional keyword search, associative access search performs better when it’s presented with more text in the query string rather than less. A query string composed of a paragraph or even a full page of text will paralyze the typical search solution, and yet an associative access solution will handle it with great speed and accuracy. In fact, the longer the query string, the better the results. Because the search query is converted to n-grams, the search engine is language independent. There is no need to filter for language at any level.

Large repositories—small indexes. Associative access search needs to index information in order to search it—whether it’s corporate file servers, email servers or websites. But the good news is that because the data indexed is no longer whole words but binary text, the index footprint is very small—usually only 10% of the original document repository. An indexed 100GB repository results in a knowledge base of about 10GB of searchable text.

Search-enabling other applications. Associative access search can easily add fuzzy validation and extraction capability to other software applications. For example, Brainware uses associative search in its Distiller enterprise data capture solution to recognize text on scanned documents when OCR quality is marginal, when misspellings occur in words, and when variations between text on the document and text in the source system are not identical, but are otherwise close enough to pass human inspection. For every "8" that gets recognized as a "B", Distiller can automatically recognize the difference based on the broader context. This error would normally kick out to a data entry clerk who would have to manually check a database or worse, find the PO and/or good-receipt documents to verify the information on the invoice.

KISS
Why bother with dictionaries, keywords and Boolean operators? Those things just slow you down. It’s time to deliver on the promise of enterprise search and to increase knowledge worker productivity—enabling you to focus on making business decisions and not learning fancy search rules. It’s that simple.

Case Study: Law Firm is Smarter, More Accurate
Fulbright & Jaworski is a full-service law firm with more than 800 attorneys in eight offices throughout the United States, with additional offices in London, Munich and Hong Kong. While information is critical to most businesses, it is particularly important to a law firm. Fulbright’s large, distributed practice has amassed more than 7.5 million documents in the firm’s document management system. Fulbright’s attorneys also extensively use research litigation support and knowledge management tools. All told, the firm is searching multiple terabytes of data.


 

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues