The expanding ease and utility of text analytics and natural language processing
Symbolic AI
Rules-based techniques are integral to symbolic reasoning, which is also known as symbolic AI. This methodology “is more traditional NLP ,” Kohli said. “It requires you to build a vocabulary and to build a domain model for that vocabulary.” More importantly, perhaps, it involves the use of taxonomies, whose hierarchies of definitions become the basis (or symbols) upon which systems reason when analyzing text. Such constructs are pivotal for enabling machines to understand and “find words by their synonyms and by their categories,” Aasman mentioned. These capabilities power use cases such as intelligent document search. Taxonomies typify the human-curated knowledge that's core to symbolic AI and knowledge management. However, organizations can expedite the curation of that knowledge with statistical AI techniques involving these components:
♦ BERT: BioBERT is a variation of BERT specifically adapted to the biomedical field. For text analytics applications in this domain, BioBERT is useful “to extract the important words out of the text,” Aasman observed. This information can springboard use cases for input- ting accurate billing codes based on descriptions of diagnoses or treatments patients undergo.
♦ GPT-3: This statistical AI approach can significantly hasten the time required to write rules pertaining to a specific text analytics application. “Rules-writing is time-consuming, but with GPT-3, you can do it 10 times as fast as before,” Aasman said.
♦ Additional deep learning techniques: A variety of deep learning approaches, including contrastive learning and manifold layout techniques, can analyze a corpus to populate a knowledge graph used for NLU or NLG with entities, terms, and concepts. “We are actually mixing machine learning for what we call knowledge-based creation to learn a subject or a segment,” Walckenaer said.
Domain models
The semantic clarity of rules-based systems, particularly when they're expedited using some of the foregoing advanced machine learning techniques, is what gives these systems higher accuracy levels for text analytics than those just involving statistical methods. “Using taxonomies, regular expressions, excluding words, including words, and processing entire vocabularies or training models is great,” Segovia acknowledged. Of equal importance is a subject area model, which, Aasman mentioned, GPT-3 lacks, that works in tandem with taxonomies to flesh out knowledge of a particular domain such as finance, supply chain management, or even an organization's “tribal knowledge” in those or other areas. Some of the conversational AI capabilities Good alluded to benefit from such models too.
“You build a data model, then our analytics engine can adjust that model and we can apply AI on top of that to have conversational analytics,” Good commented. With this approach, the underlying analytics engine can make intelligent inferences about terms (in users’ questions) which users can then refine by inputting business logic, rules, and taxonomies. Time-to-value is a capital advantage of this method. “You can start using it right away and iterate on it,” Good said. According to Aasman, the domain models—also known as ontologies—“describe objects as objects that have a set of attributes.” The nomenclature for the objects themselves is specified in the taxonomy. However, the domain model is where organizations denote what the objects actually are by “describing all the features that are important for them,” Aasman indicated. To that end, it represents a department or organization's cosmology or “world view” of that subject.