Natural language processing (NLP) is a core ability of cognitive computing systems and is often defined as helping computers process and understand human language. NLP research has been ongoing since the 1930s, and though we have made significant gains in the field, anyone who has combed through search results knows that humans have not completely bridged the communication gap with computers. Recent NLP research has focused on semi-supervised or unsupervised learning techniques, which use large amounts of hand-annotated and non-annotated data to learn from. They are seeing some success, as speakers at Basis Technology’s HLTCon recently discussed. They are using NLP to process large amounts of publically available content to gather intelligence, address terrorist threats, conduct research into social issues, tackle communication issues in refugee camps and identify victims of human trafficking in the sex trade.
On a very basic level, NLP enables computers to understand language by putting words together in meaningful phrases, assigning meaning to those phrases and drawing inferences from them. Some of the most well-known components of NLP are part-of-speech tagging, named entity resolution, word sense disambiguation and coreference resolution, each of which plays a vital role in identifying and characterizing the core text that carries the primary meaning of a phrase or sentence. Other deep technical processes behind NLP include machine learning techniques, computational linguistics and statistics across training corpora.
The ability to process language naturally allows NLP applications to summarize documents, auto-classify text, conduct sentiment analysis and provide search results with enhanced relevance ranking. In the field, it’s how you put the pieces together that counts.
Parsing data for clues
Patrick Butler, researcher at Virginia Tech, discussed his work on EMBERS—a project that uses publically available content to predict social events. The project is funded by IARPA and aims to create an automated system to parse online data for clues about what is happening in a specific society. Butler and his team are analyzing Tweets not only to determine what a protest is about but also to predict when the next one might occur. In addition, they are tracking flu cases through canceled OpenTable (opentable.com) reservations and by the number of cars they see parked outside emergency rooms. They do all of that work in the language the content is written in, and some of their processing includes turning relative phrases into actual dates. “Next week” becomes the date of the content plus seven days, for example.
Multilingual natural language processing is important to many of the cases presented at HLTCon, but not all languages are easily parsed through NLP. That is to say, the less written content there is in a language, the less developed NLP will be in that language. For example, NLP in French is excellent, but NLP in Swahili is still difficult.
That is a barrier for Gregor Stewart’s and Danielle Forsyth’s projects, both of which deal with refugee crises. Stewart, VP of product management at Basis Technology, and Forsyth, co-founder of Thetus Corp., discussed how predicting political upheaval can help prepare for refugee movement to other areas. Stewart said that the refugee crisis in Europe now is not as new as it may seem. He said that about 6 million people have been outside their home countries for more than five years and some of them have only recently been processed. The sheer volume of people moving into Europe has overwhelmed the governments there, and language differences are the biggest barrier to getting people to safety and creating mitigation policies. He speculated that this process would be greatly aided by better interpretation and translation tools that can be created through machine learning and natural language processing.
Forsyth discussed anticipating refugee crises by parsing language for overt and hidden meaning. Her work currently focuses on Africa, and she recently found five phrases used by Burundi politicians that incite violence against minority groups, including the innocuous seeming “get to work.” Monitoring that type of language and using sentiment analysis to determine its meaning helps indicate if a political crisis is likely to instigate a refugee crisis. If aid groups can successfully predict a humanitarian crisis, they can mitigate some of the effects of the crisis and perhaps keep refugees in safe areas inside their home countries. Multilingual NLP is essential to understanding the local language enough for Forsyth to be successful.
Giant Oak is using a combination of technologies that includes NLP to identify sex trade workers who are victims of human trafficking. To do so, they have to determine the behavior of sex workers who are in the trade willingly and then identify deviations from that behavior. They have mined 85 million online ads and more than 2 million reviews for sex workers for locations, phone numbers and other rich data. They are also looking for sentiment in those ads to determine if the ad writer was unhappy or drugged—a very difficult task since there may not be much difference in behavior of someone who is taking drugs and someone who is drugged. Giant Oak’s work is still in the early stages, but it is using machine learning and NLP to try to solve social issues and save lives.
So is Karthik Dinakar, Reid Hoffman Fellow at MIT. Dinakar uses models to understand and predict adolescent distress, crisis counseling, self-harm and heart disease. In his heart disease research, he found that looking at a combination of a patient’s history, parsing the words used by the patient to describe symptoms and an angiogram are better than doctors are at predicting heart attacks in women. Dinakar also found that women often use different language to describe their symptoms than do men. For the past few decades, doctors have thought that this means that men and women have different heart issues, but Dinakar’s research indicates that the issues are the same. It is how the genders talk about them that is different. The overwhelming majority of male cardiologists simply do not understand what their female patients are saying. Mapping language differences might help more female heart attack victims survive.
The conversation about cognitive computing and big data often is enterprise focused—how we can make better business decisions, discover new business opportunities and the like—but the projects at HLTCon highlighted a real ability to turn big data into information that can help people in need, both in the collective sense and in the individual sense. It is this kind of creative use of NLP technologies that can literally make cognitive computing smart enough to do some good.