Text analytics is one of the more arcane but useful technologies under the big content management tent. According to Hurwitz and Associates, text analytics is "the process of analyzing unstructured text, extracting relevant information and then transforming that information into structured information that can be leveraged in different ways."
Fern Halper, partner at Hurwitz, says, "You’re analyzing unstructured text and extracting the relevant entities, concepts, sentiment, etc., and then utilizing it in ways you want to, like marrying it with structured information from data warehouses or other data stores."
Text analytics enables search engines to provide more accurate results as well as improved results ranking and results presentation. The phrase "that can be leveraged in different ways" puts the meat on the bones of the text analytics definition. It points to the different text analytics applications without descriptions of which text analytics can be a difficult to comprehend.
The most popular text analytics application is wedding text analytics with business intelligence (BI) applications. The text analytics tools pull entities, concepts and even sentiment out of unstructured information and combine that with structured information, put both in a data store and run BI queries on them. Halper says that certain companies are planning to use a similar strategy in conjunction with their content management systems. "So if it’s claims, call center notes, e-mails, etc.," Halper says, "companies are using text analytics to help derive insight into what’s actually in those content management systems."
Voice of the customer
According to Seth Grimes, president of the consulting firm Alta Plana, an application known as "voice of the customer" is growing in popularity.
"Organizations have been trying for years to figure out what their customers think," he says, "but they’ve been doing it indirectly by examining transactional data—what’s being sold, what’s being returned, what has warranty claims. With text analytics, they can actually listen to the customers the way they are really talking by going out to blogs, product forums and other places where customers discuss products—product problems and what they like."
Grimes continues, "What’s more, customers communicating in blogs and the like type in natural language. They’re not filling out a survey [which would yield structured information] when they’re posting something to a forum or blog; they’re typing the way they’d talk to someone else [which yields unstructured information]."
Voice of the customer is also applicable to unstructured information that results from call center conversations. "So someone calls into a contact center and has an issue with a product—you can analyze the notes that the contact center person has taken to really understand what people are saying about brands, products and features," says Grimes.
Sentiment analysis is another key application in text analytics. Grimes explains, "The idea here is that you need software that can discern, for example, what an article about presidential candidates is talking about, and that does not mean just Barack Obama and John McCain, the names in the article. [That would be entity extraction—locating people, places and things in the text—another capability of text analysis.] Rather the software can automate the process of understanding what’s being said about each man. In other words, is it positive or is it negative? The application can also go down to a more granular level and explain each man’s Iraq plan from original documents from which the data was pulled."
Grimes notes that important early uses of text analytics developed in intelligence and counterterrorism. "Intelligence analysts are continually analyzing news sources, reports that come in from the field, to try to identify terrorists, their targets and so on," he says. That requires sophisticated technology, particularly given the number of languages that the information comes in, as well as the volume and the wide variety of sources.
Sue Feldman, VP, search and discovery technologies, at IDC, says that using text analytics, intelligence agents have been able to find ties between terrorists and criminals, and determine what’s happening among a group of people suspected of being related to a terrorist cell.
Life sciences is another early application that generates text analytics revenues. The process of identifying potential drugs and therapies and conducting clinical trials is very expensive and time-consuming.
"Pharmaceutical companies have put a huge investment in trying to reduce the cycle time for identifying new drugs and bringing them to market," Grimes says. They do that by lead generation. Many millions of scientific articles and papers have been written in unstructured text, and they are freely available in databases like the National Library of Medicine. Pharmaceutical companies are mining those for interactions between things like proteins, to try to identify promising candidate drugs.
According to Halper, some companies use text analytics for sentiment analysis to do executive searches. In one service, they cull through information on the Internet in e-mail, news and blog articles, forum postings and other sources for evaluations of executives in different fields to see if they would be good hires. She explains that they extract words that they’ve written rules about and use them to score executives. For instance, if they see "poor leader," that would be negative sentiment; if they see other words like "very supportive," that’s a positive sentiment.