Talk a little, type a lot - Will conversational interfaces survive Siri and Alexa?
Siri: “I’m not sure I understand ...”
Alexa: “Sorry, I don’t know that ...”
Do you hear these kinds of responses from your phone or smart speaker far more often than you’d like? I get them every time I try a subject that’s slightly complicated or possibly ambiguous or even just outside the mainstream of conventional wisdom.
Sometimes the “assistant” machine comes back with a highly confident speech about a topic that is only tangentially related to my inquiry and doesn’t address the question. I guess the excuse from Amazon and Apple (and Google whose Home product appears furthest along in conversational quality) would be that nobody likes “null responses” as we used to call them back in the days of search. But why throw up just any piece of audio that might be remotely in the vicinity of the question? All this wastes my time!
A high bar on quality
Do you notice that patiently listening to a pronouncement from a voice interface takes much longer to process than simply scanning the old-fashioned list of links on a search results page? That puts a high bar on the quality of a spoken response. The frustrating thing is that I come away from the experience with the smart assistant with new conviction that a human would find my questions totally straightforward, both to interpret and to answer. And, of course, if I really need to get the particular piece of information I’ve been looking for, I now have to start over and go to the keyboard and start typing or go to the communications side of the phone to call someone.
Annoyance leads to frustration and on to losing trust or even interest in using the computer for assistance. For example, try this question: “What is cognitive computing?” As of this writing, Siri says, “Here’s some information” and shows a link to a Wikipedia page on the iPhone screen. It’s fundamentally a blue-link list re-engineered with an updated look and a small device form factor. No conversation in evidence.
On cognitive computing wisdom, Alexa does a lot better. She tells me a very well-structured story about how cognitive computing consists of sets of programs that seek to emulate the workings of human intelligence and then lists a number of component technologies such as natural language processing and machine learning. She sounds similar to an introductory paragraph from a beginners’ book on cognitive computing. Maybe she ingested an early IBM Watson primer? But, when I follow up with this question: “When did AI research begin?” she seems not to have gotten to the history chapter, and responds, “Hmm. I don’t know that one.”
The promise of conversational interfaces
All of this is sad in addition to being frustrating primarily because of the tremendous promise inherent in the very idea of a conversational interface. We still take it for granted that since the beginning of computing, typing has been hands-down the most effective and efficient way for us to get a computer to do something useful. From typing code to typing questions into search engines, all of us humans have been forced by technology limitations to channel our bright ideas through a keyboard.
But, as many researchers, human factors professionals, and thought leaders have pointed out from the beginning, typing (especially accurately and quickly) is simply not what humans are built to do. We are built to talk and to converse, and what an exceptionally wonderful thing it would be if we could get a computer to do something useful by talking to it!
So we actually watch the experiments with Siri, Alexa, and Google Home with great interest because the idea of conversational interfaces is truly powerful and truly tantalizing. First, they promise to take the requirement for manual dexterity out of computer use. Second, as the term “conversational” implies, they promise to be able to respond via voice to our questions the way a human might, e.g., to interact, to clarify, to suggest, to expand, to deliver new ideas.
Deloitte described the promise of what it refers to as “conversational AI” in a recent sponsored article in The Wall Street Journal. Deloitte put it this way: “[Now] more companies are able to deploy interfaces that combine natural language processing, AI, and machine learning capabilities to better understand and respond to free-form text or voice in an engaging and personalized manner. These systems can transform technology adoption into a conversation rather than an exercise in mastering a user interface.”