Text analytics: not just for customer sentiment
Sentiment analysis is one of the most prevalent uses of text analytics, but the technology has many other valuable uses. Text analytics finds a range of applications in scientific, medical and technology development. It can detect root causes of events and augment the knowledge of what happened with an understanding of why it happened. When used predictively, it can help anticipate future outcomes and prevent adverse events. Text analytics can also enable process automation and case management.
The Center for Food Safety and Applied Nutrition (CFSAN) is part of the Foods and Veterinary Medicine Program of the Food and Drug Administration. It is responsible for protecting public health by ensuring that the United States’ food supply (including dietary supplements) is safe, secure and properly labeled. It also ensures that cosmetics are safe and properly labeled. CFSAN regulates more than $400 billion worth of domestic food and more than $50 billion in imported foods, as well as about $60 billion worth of cosmetics. Globalization of the food supply and increased consumer demand for high quality and variety are driving a requirement for new approaches to meet standards and ensure consumer safety.
In 2013, the Center decided to investigate ways in which problematic chemicals could be identified before they got into the food supply. “We wanted to get ahead of the curve,” says Ernest Kwegyir-Afful, lead for post-market activities in the Division for Food Contact Notifications in the Office of Food Additive Safety (OFAS), which is a part of CFSAN. “Rather than waiting for each adverse event to occur, CFSAN wanted to be aware of the precursors and try to prevent the incident.”
A Food Advisory Committee was established, composed of toxicologists, chemists, public health officials and other experts to consider how to approach and solve the problem. The committee decided to look at different data streams that could provide information that could be used to develop predictive models. The idea was to determine which signals were indicative of a subsequent problem in the food supply chain. CFSAN subsequently started a centerwide Chemical Signal Detection program, led by the Signals Management Branch in the Office of Analytics and Outreach, leveraging expertise from all the program offices in the center.
“These sources all turned out to be in text form,” Kwegyir-Afful says. “For example, there were news articles about chemical spills that although they did not affect food directly, might have an effect later if they were to occur in a food production region.” Another example is a situation in which people were getting sick and there was no apparent explanation or an illness that was reported and the affected individuals have all consumed the same food product.
Since CFSAN did not have the human resources to review the vast volume of information this task entailed, it began looking into text analytics solutions. “We attended trade shows and invited vendors to make presentations in order to examine and compare implemented technologies,” Kwegyir-Afful explains. The solution had to be cost-effective but also scalable. “If we got a solution that could be used enterprisewide, it would help a lot of other programs,” he adds.
Besides being able to analyze text, the solution needed to be capable of predictive analytics and enable visualization of the data so that someone not familiar with the data analysis could understand the results. “The task we had in mind was quite sophisticated, since it had to separate the relevant information from irrelevant data, do a conditional analysis and parse out which components are indicative of the problem area,” Kwegyir-Afful says. “Many of the solutions we looked at could do parts of this, but we wanted an end-to-end solution.”
After a careful evaluation, CFSAN selected a suite of SAS technologies (SAS Enterprise Miner, SAS Text Miner, SAS Contextual Analysis and SAS Visual Analytics) and began piloting the research to build an “Emerging Chemical Hazard Intelligence Platform” (ECHIP). “We began with food groups that were consumed by large numbers of people, the elderly or infants based on survey data that was available to us,” says Kwegyir-Afful, the lead data scientist for ECHIP. “We are not limiting the volume of information that comes in, but we are limiting the targets. Currently we are testing and refining the algorithms to eliminate the more spurious signals that the system detects.”