-->

Nominations for the 2022 KMWorld Readers’ Choice Awards now Open

Deploying text analytics and natural language processing for strategic advantage

Article Featured Image

Modern machine learning 

By utilizing deep neural networks and other advanced machine learning techniques, organizations can automate aspects of the text analytics process. Wilde noted that with this approach, organizations simply provide examples of what they’re looking for in text—perhaps, specific terms for media analysis of financial reports, for example—and the underlying models can learn to identify them, classify them as necessary, and analyze them for desired business outcomes—such as determining how they affect hedge fund opportunities. With representation learning, manifold learning, and transfer learning techniques reducing the number of labels and overall quantity of training data required, this method automates a lot of text analytics. 

“It empowers subject-matter experts to quickly express and modify their understanding of a domain,” Victoroff explained. “Rather than attempting to create an exhaustive list of all topics, users teach AI how to discover topics in new content.” The overarching semantic value of this methodology, however, has been questioned. “Machine learning can’t actually do hierarchical relationships of language,” said Ryan Welsh, CEO of Kyndi. Detractors also note that machine learning provides only a shallow understanding of language. Without expressly using measures to get around conventional training data requirements, “with supervised learning, you’ve got these systems that need tons of labeled training data,” Welsh said. “Ninety-nine percent of enterprises don’t have web-scale labeling.” 

Taxonomies and rules 

Taxonomies, of course, support an innate understanding of the hierarchy of language. They’re a staple of conventional text analytics approaches that leverage them to establish rules for the classification, extraction, and analytics process. When defining entities in text, “the words come from purpose-built taxonomies,” said John Foderaro, chief scientist at Franz. The superior semantics of this method are useful for granular classifications—even for individual sentences. Depending on the specific use case, users might “classify sentences as either idioms, interjections, salutations, questions, objections, or voice prompts,” Foderaro said. As is the case with pure machine learning methods for text analytics, the results of this method can be queried to support cognitive search and question-answering. 

“The challenge of the pure semantic approach is you need a knowledge engineer to hand-code these systems,” Welsh said. Employing taxonomic approaches with machine learning ones enables the enterprise to “use every tool in the text analytics and NLP toolbox to find answers to important business questions hidden in natural language text,” Foderaro added. 

Deep text analytics 

Hybridizing taxonomy systems with machine learning ones is the most significant development in NLP and text analytics because it unequivocally yields the best results. “You can mix the machine learning-based NLP and the graph-based NLP together into one because you can benefit from both algorithms,” Blumauer explained. The possibilities for doing so are as endless as they are for supporting use cases. 

Foderaro noted that machine learning algorithms strengthen the analysis of customer conversations to determine if those interactions are significant for improving customer satisfaction or developing new products. Blumauer described a regulatory compliance use case in which, after using machine learning for name entity recognition and classification, organizations create a digital twin of a document that’s a sub-graph of a larger knowledge graph. “Then you can create very sophisticated rules on a bigger knowledge graph which determine, for instance, if, in a legal document, compliance was fulfilled,” Blumauer said. 

Statistical and rules-based NLP 

Most importantly, organizations can combine the statistical and rules-based NLP approaches to overcome their shortcomings—the length of time spent building rules and taxonomic knowledge and for training machine learning models. Organizations can surmount the labeled-training-data necessities of the latter with this neuro-symbolic method. “You start annotating some documents manually, you also have notations done with rules, and then you combine the two and use machine learning to come out with other annotations in a sort of a virtuous cycle,” Varone said. Additionally, enterprise knowledge is also perfect for training machine learning models for text analytics. 

“If you’ve created a standard knowledge repository and, as a tool, use a knowledge graph to store this kind of information and links, you can use this knowledge to make the training faster and get better results,” Varone said. With this technique, organizations optimize any knowledge management endeavors they have by utilizing them to expedite the training of machine learning models for text analytics, resulting in less work and more tailored results for their particular use cases. This way, they’re able to pair some of the more cutting-edge machine learning methods Victoroff described with the solid semantic understanding rules provide. There also is a much shorter time to deployment so organizations can spend less time preparing systems for text analytics and more time reaping their underlying value for an array of use cases. 

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues