Text analytics is a process for extracting information from documents. It is particularly useful in tasks requiring the analysis of large quantities of information that would be impossible to do manually. Linguistic and statistical techniques are used to classify and categorize the documents, and to discover concepts and relationships within them. Linguistic techniques include identifying synonyms, determining parts of speech and disambiguation, in which context is used to determine which of several possible alternative meanings a word might have. Statistical techniques include calculations of word frequency and proximity as well as pattern analysis.
Although text analytics has long been used for drawing meaning out of large quantities of data in many fields, without a doubt the most dynamic areas right now are e-discovery and analysis of customer information from social media.
LeClairRyan is a law firm that offers corporate and litigation services, including e-discovery collection, review and production services. The firm uses a variety of e-discovery platforms but most often turns to Relativity from kCura, delivered on demand through kCura hosting partner Planet Data Discovery Management Solutions. Relativity is an e-discovery software solution for review and management of both electronic and paper-based documents. LeClairRyan recently added Relativity's text analytics to its platform.
"One of the capabilities we use regularly is document clustering," says William Belt, team leader for e-discovery practice at LeClairRyan. "In the past, we reviewed documents in a linear fashion-for example, in chronological sequence. Having the documents clustered by topic is more efficient because the reviewers do not have to shift gears as much."
Clustering can also help identify relevancy, so that groups of documents that are likely to be highly relevant are together. "This ability helps with early case assessment," Belt says. "Previously we were using key words to identify relevant documents, but text analytics gives us a much clearer picture of the data set." Another benefit of clustering is quickly identifying potential areas of risk. "We can prioritize documents according to their likely level of risk," he adds, "which puts us in a better strategic position in the review process."
Quicker relevance review
The volume of information that must be searched is a primary driver in the quest for greater efficiency. "All the documents eventually get reviewed either by attorneys or paralegals," Belt says. "With Relativity, we are able to separate out tens of thousands of documents from the original group-which might have contained 300,000 documents originally-as not relevant, and then paralegals can quickly review them to verify that classification." With a large number of documents, even a small savings in time for each one adds up to significant savings in both time and money.
kCura releases frequent updates to Relativity. "We release a new version every two months," says Nick Robertson, VP of sales and marketing. "We want Relativity to be a real law firm workhorse. Automated review workflows, integrated productions and visual data analysis have been incorporated into the product for some time. This year we added a lot of new features, such as OCR and search term reports to allow case teams to better understand and work with their documents." Newly introduced APIs allow third parties to more easily integrate additional functionality.
In particular, kCura wants to make text analytics ubiquitous in e-discovery. "We made our index building process faster and easier," says Robertson, and introduced a workflow that codes documents based on decisions of experts. Teams of experts sample the documents and code them for relevancy, and Relativity's text analytics codes the rest of the documents. Past versions of Relativity also allowed users to create relevant samples, but the review workflow is new. Based on kCura's recent look at usage across its customers, text analytics is being used more than 10 times as often in 2011 as it was in the last quarter of 2009.
Voice of the customer
The other booming sector for text analytics is in the area of customer feedback through social media as well as traditional channels such as customer relationship management (CRM) systems. Whirlpool manufactures consumer products under its own brand name and also under Amana, Jenn-Air, KitchenAid and others. Like many consumer products manufacturers, Whirlpool wants to keep a close watch on customer sentiment, and has used software products from Attensity for a number of years to provide an integrated view of its customers' experiences from its CRM system, e-mails and customer service records.
In 2010, Whirlpool began using Attensity360, which monitors social media, and extended its use of Attensity Respond, which allows Whirlpool to respond to and track customer comments online, into its social media monitoring. The company is now able to extract information from online customer conversations and other social media sources to gain insights into possible new products and take action in the case of customer dissatisfaction. Whirlpool also has developed a set of metrics that indicate the number of customers contacted, positive and negative comments about their own brands and those of competitors, and other information that assists in product development and customer service.
Companies are not always sure what they should be doing with social media, either in terms of their own participation in corporate blogs, Facebook and Twitter, or in terms of using such information productively. "We are seeing a new class of buyers of Attensity who have been asked by their CEOs, ‘Why don't we know what's going on in social media?'" says Michelle de Haaff, CMO of Attensity. "There is definitely a drive to use text analytics in large companies to unleash the voice of the customer."
Interpreting the Tweets poses some special challenges. Attensity360 offers Twitter's "firehose," a streaming API that provides access to Tweets for analysis, as a native service in Attensity360. To make the Tweets meaningful, however, Attensity must translate them into standard English. "We had linguists studying this ‘slanguage' for the past two years," de Haaff says, "so that every word is parsed, making it possible to see relationships among words, even if the words are abbreviations, acronyms or emoticons."
Social media's info avalanche
Attensity also provides in its response software a method for prioritizing responses to customers who ask questions in forums or on a company's Facebook page. "Attensity360 does the prioritizing based on predefined business rules," adds de Haaff. Therefore a telecom company can quickly respond to a customer who is contemplating switching services, one whose sentiment is strongly negative, or someone who counts as a "good" customer.
Social media provides the low-hanging fruit for automated gathering of customer intelligence, according to James Kobielus, senior analyst at Forrester Research. "In the past, companies did surveys or held focus groups," he says, "but now, customers are constantly telling companies what they think via social media such as Twitter and Facebook." The challenge is to capture and make actionable that avalanche of information. Those "listening engines" that tap into social media streams provide different types of feedback to companies ranging from awareness to sentiment to most appropriate offers, promotions and other responses.