October 29, 2013
By Stephen E. Arnold Managing Director - ArnoldIT.COM
News Analysis

Where Is ConceptNet, Watson?

The focus is on making changes in how organizations "think, act and operate." There is also the introduction of a phrase that I first saw used in relation to determining what medical procedures are the ones that work. The phrase is "evidence-based responses." The value of the content processing system is to deliver "better outcomes."

More each day

The reality of search and content processing today is that the volume of information in digital form is large. The other fact is that more digital data is produced each day than the day before. Automated data collection systems work on weekends and holidays. There is no respite from the exponential growth of digital information.

Semantic systems, advanced analytics, Google-style self-referential systems and sophisticated optimization techniques—all of those are struggling to help users perform some basic tasks.

First, where is the specific fact I need and how do I obtain that particular item? That is the classic needle in a haystack problem. Many years ago, Matt Koll, founder of Personal Library Software (PLS), talked about finding a single needle in one haystack and finding a needle in several haystacks. Today users are struggling to find facets of needles in millions of farmers' haystacks.

Second, when there is too much data, no human or team of humans can look at the information. When Pointcast and Backweb introduced desktop feeds to "push" information to users, the volume of data was becoming too much for a person working in a company with slow network connections and virtually no online knowhow. Today apps and newsreaders provide one-click access to important information. But those innovations are, in my opinion, more graphically pleasing versions of a 1970s-style SDI (selective dissemination of information) service. Filters are useful. The problem for many professionals and myself is that important information is left on the cutting room floor. Becoming too dependent on one filtering service means that I see only the information the filtering service processes. The fix is to look at more and more information. In effect, filtering (smart or stupid) does not address the problem.

Third, a person who needs information about a complicated problem does not know what he or she needs to know, and will not know without old-school research. Each day I receive e-mails about new systems that use advanced technology to extract meaning from big data. When I meet with those firms, I can see brilliant visual presentations and learn about solutions that deliver actionable business information. The problem is that for certain types of business questions, I am not sure what I need. When a system outputs a report based on my preferences or behavior, I have zero idea upon what data and what assumptions and what methods the outputs were created.

Sidestepping the problem

To fill in that gap, I have to learn quickly and practice the techniques explained by Ricki Linksman in the 2001 book How to Learn Anything Quickly. Spoiler: Among the most important methods are asking people, traditional library research and for-fee databases. The "fast" comes from setting aside time and focusing. Without effort, the results are likely to be little better than taking the first Google hit and running with it.

What prevents search, content processing and analytics firms from making a major breakthrough in information retrieval? To answer that question, I did some old-fashioned research, which led me to what is called "Big O." That refers to a problem related to the computability of a mathematical process.

Algorithms can be too complex to compute. Even with a supercomputer at one's disposal, certain methods cannot be used. The article "Know Thy Complexities" collects examples of mathematical procedures of considerable utility. The problem is that many of those methods choke today's computers. Instead of having unlimited computational capability, we are living in a world with too little horsepower to pull the load up the hill.

I was surprised to learn that the most sophisticated systems like those of MIT and IBM operate so that the Big O problem is sidestepped. There are, I learned, several significant consequences of computational boundaries. The first is that modern systems for search, content processing and analytics are increasingly similar. The reason is that the designers of these systems have to avoid methods that are too complex to compute on available resources. The algorithms that can be calculated or processed are the ones developers use, so systems are similar because they use similar approaches.

Good enough?

The shift to predictive analytics, facets, search suggestions, personalization and sampling-based relationship analysis provides shortcuts to the user. The user can click. Crafting a query and nuts-and-bolts search preparation are no longer needed, it seems. The eye candy of visualizations makes it easy to spot big trends and outliers. Armed with basic information, the user presumably can then do additional research to verify the outputs of a system that displays pictures. Text is available with a click, but once the user "drills down" into the underlying information, traditional research is needed.

In today's fast-moving world, the information displayed by the system is "good enough." At a conference this year in Germany, I was stunned to hear search experts emphasize that users today are more than satisfied with "good enough" results. My point was that a user may be happy, but what about those trapped in a bad decision based on stale, flawed or incomplete information?

The use of predictive methods to "trim" data is becoming more prevalent. The statistical procedures used widely today are indeed important methods. Modern systems are set up so that the user and even the system administrator are kept well away from the underlying plumbing. The result is that becoming familiar with the data and their exceptions or the threshold values of specific algorithms are locked away. The users of those systems assume that the developer has made the correct decisions. Predictive systems work well in certain situations. In others, the methods are not particularly useful. How many predictive analytics experts quit their jobs and make money picking horses at the race track? Not too many.

The fact that MIT's system is as smart as a four-year-old is a somewhat negative comment about the state of search and retrieval in the enterprise. Semantic methods are moving into the mainstream. But with a four-year-old's intellect, is the pace of search innovation slow, maybe tortoise speed?

If we define innovation as the process of turning ideas into marketable products, search and content processing vendors are struggling. The marketers are ascendant. Finding and analyzing information still requires hard work.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Register Now to SAVE BIG & Join Us for KMWorld 2025, November 17-20, in Washington, DC.

Where Is ConceptNet, Watson?

More each day

Sidestepping the problem

Good enough?

Special Report- Shadow AI: Managing the Unseen Copyright Risks in Your Organization

Supercharging Your Customer Experience Program With AI and Automation

Special Report- The Role Metadata Plays in the Information Lifecycle

More

Reshaping Information Discovery: Search and GenAI

Revolutionizing CX With Automation and AI

The New World of Content Management in the AI Era

Driving Better Digital Experiences With AI and Automation

More Webinars