Registration is now open for KMWorld 2019. Register now to join us Nov 4 - 7 in Washington, D.C.

Beyond text analytics

This article appears in the issue April 2009 [Volume 18, Issue 4]

Content analytics refers to an emerging set of products and capabilities that augment search to provide a variety of derivative results, according to Rita Knox, a research VP at Gartner who focuses on information access technologies, particularly content analytics. It supports such goals as identifying high-priority clients, interpreting competitors’ activities, understanding consumer responses to a new product, detecting fraud and more.

Like text analytics (of which content analytics is a superset), content analytics can convert much unstructured information into structured text, from which information can be searched and extracted as needed to derive meaning from what appears to be unrelated entities and concepts. Unlike text analytics, content analytics can also deal with audio, video and graphics (non-textual analytics) for similar purposes. It also replaces human searching of large infobases. Therefore, it reduces the time needed, for example, to determine what customers think about a new product by searching blogs where consumers post opinions on the subject.

Knox says, "Content analytics is an application that processes or mines content to give answers to specific questions. It can consist of a single function such as text analysis, but more often it will be a series of functions that pass results in a sequence from one operation to the next." (See the chart on page 13, KMWorld, Vol 18 #4 or download chart http://www.kmworld.com/downloads/53100/ContentAnalytics.pdf). For instance, a long voice document of a focus group concerning a product would be easier to analyze by converting it to speech and then summarizing the resulting text.

Sue Feldman, research VP, search and digital marketplace technologies, at IDC, refers to content analysis as information intelligence, and emphasizes another aspect of the discipline’s efficacy. She describes content analytics as "a set of language analysis processes that mimic our ability to find out the what, who, when and where of a document."

"Determining ‘aboutness’ is what these analytics technologies are designed to do," Feldman says. "They mirror the human process of finding underlying meaning by using clues in the grammar, syntax and superficial meaning. Once we have determined this underlying meaning, we can pull together similar ideas, even if they are expressed differently. For instance, we know that ‘hypertension’ and ‘high blood pressure’ mean the same thing."

"Based on our understanding of how language works," she continues, "we can distill the human processes of understanding meaning into rules that are understandable by a computer."

Because the discipline is abstract and complex, it might be best approached through a few examples of its real-world application.

Aspects of content analytics

Affinity is an aspect of content analytics that can be used to predict what a customer will be inclined to purchase based on what he or she has purchased in the past, and what other customers whose buying behavior is similar to that customer’s will buy in the future. A store can then direct notices to that buying group, whenever those types of product go on sale.

When used to help determine reputation, content analytics might be deployed by a corporation that just went through a major product recall to plumb relevant blogs and customer service call centers to nail down how negatively the recall might have impacted its brand. If the company finds the feedback contains overwhelming criticism, it can respond with appropriate advertising to help offset the recall’s damage.

Content analytics for sentiment analysis of an organization might focus on blogs about the automobile industry to get a feel for how the voting public—as well as associations, analysts, newspapers and other organizations associated with the industry—feel about the congressional funding of GM and Chrysler. More specifically, members of Congress might use such information to help determine how their constituency feels about the action and vote on the funding according to their constituency’s preferences.

Speech analytics may be the most technologically arcane of the components of content analytics, but its usefulness cannot be underestimated. Customer call centers use it to convert the content of a phone conversation into text, to determine a product’s success. The now structured text can be dumped into a data warehouse where executives can make decisions about what to do regarding the feedback. If, for example, the word "screen" appeared numerous times in the text version three months after a computer company released a laptop, it would be likely that there were problems with the laptop’s screen, and the examined text could bear that out. The computer company can then take steps to fix the screen.

Summarization has obvious efficacy because it can reduce large reports to a couple of paragraphs to expedite investigation of a crime. The modus operandi (MO) of recent car thefts might involve smashing a window and hot-wiring the car in a certain area of Chicago. Investigators can look through the summarized reports spanning the last five years to see if any former thief, with a similar MO, is now out of jail and possibly reverting to his or her past criminal behavior.

With text analytics, unstructured text can be converted to structured text so investigators can analyze the entire corpus of data to detect fraud. For instance, it might identify similar car insurance claims submitted two months apart to see if the person submitting the former is also submitting the latter to try to be reimbursed for an accident twice.

Additional functions of content analytics

In addition to the aspects of content analytics listed in the chart, Feldman adds a few functions that are important to understand. Concept extraction, for instance, identifies the major topics of a document. Take the example of someone searching for documents dealing with insurrections or wars.

"These will be retrieved in searches on sports," she says, because the imagery used in sports is the same as that used to describe conflicts. Consider a search on "Wars in Angola." It will retrieve a story about a rugby match that contains this text: ‘After that incident, it was all-out war, but the Angolan team finally battled its way to victory.’ By determining the topic of a document, using its full context, the ambiguity is resolved and the search does not return sports documents when the query is about war.

On the other hand, relationship extraction, says Feldman, determines how entities and concepts are related to each other. "A simple example," she explains, "is cause and effect. Take the two sentences ‘Bill hit Fred’ and ‘Fred hit Bill.’" While the words are the same, the relationship between Fred and Bill couldn’t be more different. Relationship extraction uses the syntax of a sentence to determine the relationship between the entities and also to determine what kind of relationship that might be.

Relevance in vertical industries

Content analytics is relevant in many industries. Aside from those already mentioned, Feldman lists others. Service firms use content analytics to staff new projects by finding employees who have the skills needed to perform the required tasks. That saves the fees they would have to pay outside staffing contractors.

Global enterprises use content analytics to determine whether the terms of each contract are fulfilled. With thousands of contracts to manage, they need to be able to match deliverables in the contracts with the deliverables actually received.

Market intelligence professionals use content analytics to gather internal and external information more efficiently and with greater coverage and accuracy.

Benefits

In addition to the benefits of content analytics mentioned above, other major ones should not be overlooked. According to Knox, content analytics saves knowledge workers time in doing their jobs. With additional time, they can analyze much more data to detect trends. The technology also lets the worker look at data in the same way, despite its original format, location and process. And, importantly, it performs analysis with a neutral view, uncorrupted by human bias.

A moving target

Content analytics is bound to be as commonly used as text analytics once the market shakes out, and dominant and niche players emerge. Buyers should not expect one product to perform most of the activities described here, however. Some players have and will continue to concentrate on one or two capabilities and on certain vertical markets in which they are especially valuable. But expect consolidation, too—in a market with over 50 players, the exact solution for a given business problem might be a moving target for some time yet.


Search KMWorld

Connect