Text analytics: not just for customer sentiment
The data sources include publications from the National Institutes of Health such as PubMed, which includes more than 27 million citations from biomedical literature, news channels, announcements about problems in the food chain and some blogs. “Typically, the system might ingest over 10,000 records a day,” Kwegyir-Afful explains. “After the system completes the first stage of the contextual analysis, which is designed to filter for our target foods (or an event involving the consumption of food) within the context of chemicals or adverse events, it is saved into the database. The data is then run through a predictive model that was trained on human-curated data that identified data indicative of a potential signal. Both signal-related and non-signal-related data are saved so that we can analyze over time to see if there are developing trends.”
CFSAN scientists built a dictionary or ontology of topics of interest and then code was written to identify the relationships between the elements in the ontology. “The system is looking for things in context,” he points out. “For example, if there are reports of a chemical in food that might be related to cancer, it will look for cancer, or if it identifies water as an issue, the system looks for water contamination in relation to consumption.”
Once the results are visualized, public health officials and analysts look at it and review the predictions. They can then go back and look more deeply at the research, by reading the full publication after reviewing the abstract and then verifying if there is a signal that would indicate a potential adverse outcome.
The second and equally challenging phase is to decide what action to take—for example, issuing a warning or ordering a recall. “It is important to strike the right balance between taking precautions and creating an unnecessary disruption,” Kwegyir-Afful explains. “Text analytics does not solve this problem, but it does give us the information that allows us to make a data-driven decision.”
Evolving in important ways
Other applications for text analytics include early detection of technical problems and incorporating warranty issues into product design, according to Simran Bagga, principal product manager for text analytics at SAS. “Industries such as energy and manufacturing are gathering customer feedback, not for sentiment analysis but to detect problems that may need to be addressed by R&D or quality assurance,” she says. “In addition, a lot of customers in the legal industry are automating contract management.”
Over the past several years, SAS has added the ability to analyze all major Asian and European languages natively. “This was a key request from customers to support the global and complex nature of their businesses,” says Bagga. “SAS has extended its capabilities by enabling an open architecture allowing users to call text-based machine learning procedures through APIs in Java, Python, R, Lua and RESTful services.”
In future releases, SAS Text Analytics will continue to extend its API support and incorporate capabilities to automate rule generation through more modern machine learning approaches such as deep learning.
The text analytics industry in general is evolving in some important ways, Bagga says. First, it is moving rapidly into cognitive computing and AI. Techniques such as recurrent neural networks and sequence-to-sequence mapping are enabling greater accuracy in text analytics without requiring linguistic rules to be written, which saves time. Second, text analytics as applied to big data is now being conducted at the edge (the point at which data enters the network).
That approach provides results sooner because analytics are being performed on streaming data. “Users want to combine text analytics with streaming data to maximize value and allow process automation based on information contained in text,” Bagga says. “An example of real-time operational intelligence is signaling alerts from social media to identify disease outbreaks or other potential threats such as drastic weather events or acts of terrorism.”
Text analytics and process automation
Some of the less familiar applications of text analytics can provide significant strategic value. “We have seen banks and insurance use text analytics as an enabler of process automation,” says Luca Scagliarini, VP of strategy and business development for Expert System. “They can use it to accelerate claims management, improve underwriting, risk grading, quotations, compliance and, generally speaking, any case management application. We see strategic value in all of these areas.”
In claims and manufacturing, text analytics can create and enrich root case analysis models. “Text analytics is a way to add ‘sensors’ inside unstructured information,” Scagliarini adds. “Often companies in industries such as oil and gas are collecting data from physical sensors and building models to see how their sales or production will evolve. Text analytics can be used to expand the models by incorporating information from news sources or other documents.” The model can then provide a trigger that launches a process or makes a prediction.
Many models, for example, predict the value of stocks based on past performance, and that data can be downloaded by the minute. However, there are events that will affect the value in the future and news from which classes of events can be extracted in real time. “If those events are detected through text analytics and converted into quantitative data, the performance of the model can be improved,” Scagliarini observes. “The root cause may be an event, such as damage to a drilling device, but the final objective is to make sure a particular outcome such as a drop in sales is avoided, and multiple factors can contribute to this outcome.”
Expert System’s Cogito brings together artificial intelligence (AI), cognitive computing and semantic technology to extract value from enterprise content. “There has been something of a dispute in the marketplace about statistically-based machine learning versus the semantic approach, which involves linguistic rules,” Scagliarini says. “We focus on the semantic approach and have a knowledgebase that embodies the concepts of language, which enables very sophisticated work. However, from a pragmatic perspective, we believe that both should be available, so we have added many elements of machine learning and deep learning.”