Understanding What Matters With Text Analytics and NLP
The most important thing to understand about text analytics and natural language processing (NLP) is that these critical aspects of knowledge management are intrinsically related. They enable IT systems to analyze—and, to a certain extent, understand—human language as text so organizations can profit from the wealth of unstructured data inundating the enterprise. What’s less straightforward, perhaps, is the nature of the relationship between text analytics and NLP.
“It’s accurate to say that text analytics is a subcategory of NLP,” said Indico CEO Tom Wilde. Additionally, as a subset of NLP, text analytics almost always involves some form of NLP, or, as Denodo CMO Ravi Shankar explained, it’s not possible to do text analytics without NLP.
The second most important thing to understand about these capabilities is that they’ve traditionally utilized two approaches: a rules-based one and a statistical one. These methods represent the two sides of AI as typified by expert systems and machine learning. The future of text analytics and NLP likely involves synthesizing the approaches. “We are using both of them and both of them are complementing each other,” said Vivek Mishra, product head of vPhrase, noting that each serves a different purpose.
By relying on these approaches, organizations can use NLP to perfect text analytics elements such as these:
♦ Parsing: According to Shankar, this can be thought of as similar to reading. Tokenization and tagging are aspects of parsing at which NLP excels.
♦ Entity extraction: Entity extraction is the ability to recognize entities (names, products, business concepts, etc.) and extract them for analysis, which may involve moving them between documents or between databases for document workflows, for example.
♦ Classification: Although sentiment analysis is the quintessential classification use case, classification also involves categorizing text according to predefined types, such as those pertaining to specific regulations.
♦ Contextual understanding: Also known as semantic understanding, this facet of text analytics is predicated on understanding nuances of a word’s meaning based on its use—such as distinguishing when Charlotte is someone’s name as opposed to a city.
♦ Pragmatic understanding: This form of understanding supersedes mere context to fully comprehend consequences, implications, and subtleties of language—which is reinforced via a credible knowledgebase.
Mastering these domains is integral to structuring what’s otherwise unstructured text and deriving meaningful action from it.
Whether employing traditional rulesbased approaches to text analytics or leveraging more modern machine learning strategies, users must initially train the systems on relevant business domains. One way to do so is with comprehensive taxonomies of terms, their synonyms, and their meanings—which are traditionally associated with rules-based models. According to Franz CEO Jans Aasman, “There’s a part of NLP where people create taxonomies and ontologies. That is just a very acceptable way of doing NLP.” Historically, such defined hierarchies of vocabularies were paired with rules to find patterns in text and create actions such as classifications or entity extraction.
Advantages of this approach are that it is extremely domain-specific (and accurate), doesn’t require a surfeit of training data, and, when coupled with ontologies, supports a knowledgebase for pragmatic understanding of language. Disadvantages include the time-consuming nature of assembling taxonomies and the brittleness of rules. “Those are effective if you can adequately define ahead of time all the permutations of the particular text,” Wilde noted. “That’s great, except that in the real world, when you’re dealing with things like documents, text never behaves quite so specifically. People use different vocabularies, different formatting of the text. Imagine how many permutations there are for dates.”
Companies and Suppliers Mentioned