Text analytics reaches new territory

Article Featured Image

Focusing on the unique comments

When federal, state, or local governments propose changes in rules and regulations, an opportunity for public comment is frequently required. At the U.S. Fish and Wildlife Service (USFWS), adding or removing animals from the endangered species list is one example. Advocates on one side or the other often flood the agency with emails and form letters. In the case of proposing to remove some populations of wolves from the list, the agency received hundreds of thousands—but possibly more than a million—comments on various proposals. Processing the content would be impossible without text analytics.

For the past decade, USFWS has been using DiscoverText text analysis software to sort through the many of these larger batches of public comments. “The most important step is deduplication,” said Seth Willey, deputy assistant regional director for ecological services for the southwest region at USFWS. “In one case, we received about 30,000 comments from the same organization, but only three were unique. Most were form letters.” DiscoverText identifies the exact duplicates and also the unique comments from near duplicates. “After we did the analysis, we only had to read three from this particular organization,” continued Willey. “This is a huge savings in time and taxpayer money.”

The number of comments received does not factor into decision making. “Our statutory mandate is to make decisions based on the best scientific and commercial data available,” noted Willey. “We are looking for valid insights into reasons why a species should or should not remain on the list.” Withthe help of software, the final number of comments to be reviewed can be small enough that human coding can be used to categorize the data. DiscoverText has allowed the agency to meet its mandate for review within the allotted timeframes with just a small staff.

At the University of California, Los Angeles, Karen Umemoto, director of the Asian American Studies Center and Helen Morgan Chu chair, is using DiscoverText to analyze hate speech in the Twittersphere. “We took a 2-week sample of tweets relating to Asian-Americans and the perceived role of Asians in the COVID-19 virus,” said Umemoto. “We analyzed personal experiences with harassment, people’s responses to negative comments, informational tweets about the source of the virus, and other related content.”

After deduplication of the redundant tweets, they were coded by researchers into various categories, and analyzed. “We were able to get a deeper understanding of the dynamics around this issue,” commented Umemoto. “As a result, we were quickly able to produce a preliminary policy brief that will help determine how to address this ongoing issue.”

DiscoverText was the only tool that was suitable for the research, according to Umemoto. “It is easy to use, very powerful, and off-the-shelf-ready. You don’t need to be a ‘big data’ specialist or computer science expert to use it.” The center has not yet used the full power of the software, “such as intercoder reliability, sampling, and using AI for analysis, but it has already met our needs for this stage of our work,” said Umemoto.

KMWorld Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues