Taming Unstructured Data with Text Analytics
It is today widely recognized that the vast majority of information in any business is unstructured data, typically in text format such as reports, filled forms, emails, memos, log entries, transcripts, etc. Most of the time, this rich source of information remains untapped—sometimes because companies are not fully aware of its potential value, more often because of the tremendous effort it takes to sift and dig out information manually from such large volumes.
Text mining provides a viable solution. By combining natural language processing, statistical and machine learning techniques, text mining can quickly extract useful information from large collections of documents. A text mining tool will typically process a million words in a few seconds to automatically extract topics and discover unknown relationships and patterns.
Companies see the real power behind text analytics when they combine text mining results with structured data. For example, a manufacturing company can mine hundreds of thousands of warranty claims, maintenance reports or incident reports to identify the most costly defects. A services company can analyze customer comments alongside satisfaction or NPS scores to develop strategies for improving product offerings. The creation of numerical indices derived from text can also be integrated into predictive models, generating improved forecasting or prediction results.
Beyond Sentiment Analysis and Social Media
In recent years, there has been a growing interest in text analytics, along with an increase in new vendors offering text analytics solutions. Most of these focus almost exclusively on sentiment analysis and social media, to the extent that this application has become, in the minds of many, the very definition of text analytics. This is unfortunate, as it does not do justice to the wide range of business applications of text analytics. One could also raise serious questions about whether social media really represent the most useful source of information.
We suggest that companies who will benefit the most from the implementation of text analytics will be those recognizing the opportunities offered by unstructured data already available or easily accessible within their company. The tremendous hype around sentiment analysis should also be tempered by a careful assessment of its usefulness and its accuracy (which is often much lower than what vendors would like you to believe).
More Detailed, More Accurate Text Analytics
Analyzing human language is a very complex task, and text mining is still, in many respects, in its infancy. Newcomers to text mining expecting their tools to readily provide comprehensive and precise answers to their questions may very well be disappointed. Moving beyond the obvious to achieve greater details and precision often requires some efforts on the part of the text analyst. It involves building a custom dictionary composed of keywords, key phrases and rules. Such a crucial task may take days, weeks, in some cases months. Yet it still represents a tiny fraction of the time it would take to do manually. Once developed and validated, such taxonomy becomes invaluable, allowing one to fully automate the analysis of newly obtained text data or process incoming streams of text data in real-time.
Text mining regularly turns up previously hidden gems, which companies quickly respond to positively. Such insights give them the competitive advantage they are looking for, hidden this whole time in their very own “backyard” data.
Software Company Uses Text Analytics to Analyze Written Comments
Montreal-headquartered software company eXplorance provides tools for authoring and distributing survey and evaluation forms, and reporting on them. Their all-in-one assessment system helps organizations assess skills, knowledge, competencies, needs and expectations, and to develop a culture of continuous improvement. Their assessment tools provide online, streamlined forms, which can be completed on desktop computers or mobile devices. The system collects data on close-ended questions, ratings, as well as open-ended questions. When respondents started to receive the more user-friendly online forms, they soon responded with prolific open-ended feedback.
“As organizations focus more on meeting employee and customer expectations, they need a way to make sense of the qualitative feedback provided by these stakeholders,” said Samer Saab, CEO of eXplorance. “This kind of feedback is free form, and represents an unbiased, unguided expression of expectations.”
eXplorance decided to seek out a provider for an advanced text analytics tool. “In feedback culture, sentiment analysis is not enough. That information was more or less available in the ratings and numerical scores from our reports,” said Saab. “We needed something more sophisticated. What characterized unsatisfied from satisfied respondents? What are some aspects they like? What needs to be improved?”
eXplorance collaborated with Provalis Research to develop a text analytics solution. They used the WordStat desktop application to develop a custom dictionary and their SDK to integrate this categorization dictionary into their learning experience management system. “Provalis Research has a rigorous approach to text analytics, using theme-based interpretations,” said Saab. “We find their mixed-method approach to analysis has brought even greater insights to the data gathered by our system.”
eXplorance has customers on almost every continent. Their text analytics must manage not just typical misspellings, but regional spelling and word meanings, as well as cultural context. To address this, they use a teaching and learning dictionary based on 1.8 million open-ended responses from diverse sources. The current dictionary comprises more than 10,300 entries, consisting of word, word patterns, phrases, rules and more than 4,000 misspellings classified in 151 categories, including 22 positive and 23 negative attributes, and various topics and potential issues.
“Several clients have already adopted the text analysis tool, and we continue to expand our analytics set,” Saab said. “We look forward to further enhancing our offering, to provide the data organizations need to foster strategic insights for future innovation. We believe text analytics will play an important role in this.”
Provalis Research is a world-leading developer of text analytics tools used by more than 3,000 institutions worldwide. Visit provalisresearch.com or call 855-355-5252 for more information.