Deploying text analytics and natural language processing for strategic advantage
Almost everything you know about text analytics, or thought you knew about it, is shifting. In fact, text analytics is changing for the better to become easier, quicker, and more applicable to enterprise use cases than it’s ever been before—largely due to advancements in natural language processing (NLP).
Consequently, this form of analytics is underpinning everything from traditional sentiment analysis to natural language search and speech recognition. It’s no longer just a means of applying knowledge management but also a way of creating action to effect strategic advantage in any number of domains.
Specific modifications to how text analytics is used throughout the enterprise include the following:
♦ Data structure: Text is no longer solely relegated to the murky uncertainties of unstructured or even semi-structured data but can regularly be converted into structured data that’s easily computable.
♦ Advanced machine learning: Several machine learning approaches pertaining to word embeddings and what Slater Victoroff, CTO of Indico Data, termed “modern AI techniques like representation learning and manifold layout” reduce the number of examples models need to learn while also expanding their enterprise utility for NLP.
♦ Taxonomies: As one of the mainstays of symbolic reasoning, taxonomies containing enterprise knowledge are increasingly coupled with machine learning processes in a calculated confluence of traditional and modern NLP approaches for more accessible text analytics.
♦ Speech recognition: Text analytics is rapidly expanding to encompass analysis of spoken conversations with far-reaching implications for customers, product development, marketing opportunities, and sales.
The impact of these changes to text analytics is twofold. First, they collectively produce a comprehensive transformation so that text analytics is more accessible to a broader scope of users than was previously possible. Moreover, they are credited with normalizing text analytics so firms can routinely exploit one of the most valued sources of enterprise knowledge for strategic advantage.
According to Marco Varone, CTO of expert.ai, “Language understanding is always the most important step for trying to advance a computer system that’s state of the art and to unlock, for companies, the huge amount of knowledge and actionable information that is available in any kind of linguistic communication, text, document, or message.”
The unstructured issue
Perhaps the biggest barrier to unlocking the knowledge found in text is the fact it’s widely considered to be unstructured data that doesn’t conform to the traditional tabular data representation. According to SAS chief data scientist Wayne Thompson, however, organizations can easily transmute text into structured data by making documents “rows” and the terms in them “columns.” The point is to quantify how often each word appears across each row.
“For example, each article that someone wrote is a row in a table and each column, of which there are many, is a term,” Thompson said. “Like ‘semantic’ would be one column and then there’d be low information words that we would strip out like ‘into’ and ‘be’ and ‘but.’ By totaling the number of times terms appear in the rows of the document, organizations get numerical representations of text that fit into relational models so they can treat text as structured data to jumpstart certain NLP models—and their use cases.”
Classify, extract, and analyze
Regardless of which approach is used, the basics of text analytics remain the time-honored three-step process of what Tom Wilde, CEO of Indico Data, called “classify, extract, and analyze.” These procedures support an overwhelming number of text analytics use cases, from intelligent process automation to conversational AI applications of natural language queries of datasets. By employing NLP to understand the basic grammar and sentence structure of text, organizations can classify documents and specific entities in them, extract them as needed, and analyze them for their downstream application of choice.
“Outside of traditional analytics use cases like voice of customer insights and media analytics, some of the most compelling use cases for this approach are contract analytics and knowledge management,” Victoroff remarked. It’s critical to realize that almost any form of NLP follows the aforementioned three-step process, which might involve what Semantic Web Company CEO Andreas Blumauer characterized as “name entity recognition and concept-based tagging.” What differs is the approach used. Some NLP is predicated on advanced machine learning, while other forms of NLP rely heavily on taxonomies and rules.