Using speech technology to make real-time decisions at KMWorld Connect 2020
Jans Aasman CEO, Franz Inc discussed “Analyzing Spoken Conversations for Real-Time Decision Support in Mission-Critical Applications” during his KMWorld Connect 2020 presentation, part of the Text Analytics Forum track.
KMWorld Connect, November 16-19, and its co-located events, covers future-focused strategies, technologies, and tools to help organizations transform for positive outcomes.
This talk covered the work Franz does to analyze spoken conversations between customers and CRM or call center agents in mission-critical applications along with the additional challenges of making sure speech-to-text technology can deal with domain concepts.
Text analytics is growing fast, Aasman explained, but speech is growing even faster. Voice recognition is becoming part of everyday life from the use of Alexa to Cortana to Google Assistant.
The APIs for speech processing are getting better by the day, he said. It’s become easy to incorporate this technology in work.
By combining speech technology, natural language processing, and knowledge graphs Franz has been able to help an intelligent call center.
Before a salesperson calls a customer they research their customer in order to start a conversation, he explained. But, this company had so much information, they were flooded with dark data.
Franz built a knowledge graph to sort through the structured data, he said. The platform was built on top of AllegroGraph. Through this taxonomies can then be built, and items can be sorted, classified, and more.
For speech technology, companies can use a taxonomy trained speech recognizer.
Companies need a customizable speech recognizer that can take input directly from a taxonomy, he said.
Most speech to text systems tend to fail on very specific domain words, instead it needs to be customized.
“For static domains we train the language models using machine learning but the process is expensive and takes more time,” Aasman said.
Building alt-labels for speech is an art in itself and very different from building alt-labels for written text, he said.
Organizations need a speech recognizer that provides both real-time and batch processing. Batch processing is good enough for regular analytics and users can spread out the transcription over those periods when there is less need for real time processing.
Diarization is very important but commercial speech recognition systems have a hard time with phone conversations. It makes a difference who talks about the product, competitor, an object or sentiments.
Franz combines the dual stream features of speech recognition with smart software splitting streams at the PBX level, he said.
“This is all very new,” Aasman said.
For every campaign and for every language enterprises should have a gold standard conversation. Run that every 2 minutes to see if the right words and pharses come back, he explained.
Always have a human in the loop, he suggested. By training a speech recognition tool through taxonomies, some of the regular English deteriorates. Only a human can detect that.