Focus on research: Prof Benjamin Cowan, Adapt
Prof Benjamin Cowan is a research fellow at Adapt, the Science Foundation Ireland research centre for digital content. In this interview he talks about the development of digital personal assistants and the challenge of managing what and how they learn about us.
You’ve been working on conversational interfaces since before the arrival of Siri. What attracted you to this field?
I’ve always been fascinated by how people use language and why people choose the language they do when they are in conversation with people. I’ve also always been interested in technology and how people use it.
Based on these, it seemed like a natural fit to start looking at how people talk to machines. Because of the technological advancements in the field, this has seen me move from voice-based banking systems, which I researched in my spare time as a PhD student, to digital personal assistants like Siri and Google Assistant.
Digital personal assistants suffer in understanding subtleties like context and tone.
They do, especially with the subtleties of natural human conversation, which make conversations seem so seamless and natural. In some respects though, the way that these systems are designed make this a problem.
The human-like nature of most assistants creates a gulf between what we expect these systems to do and what they actually can do. Using human like voices and language does give users a sense that they should be able to interact with these devices like they do other humans, that they should be able to pick up these subtleties, but that is just not the case in commercial systems at the moment.
That is not to say that people think these systems are human, they clearly know the difference. It’s just that making people think that they can speak to a system like they can to a human is probably not helpful as it creates a false expectation that is quickly dashed by the reality.
As the use of AI opens up this technology to more users, are we going to see companies with their own personal virtual assistants?
I think this is definitely going to become more common. Companies may see the agent design as part of their branding, choosing particular voices, phrases and commands to echo parts of their brand and company ethos. I think this may end up being more of a gimmick though. In this kind of future, it might be very hard to identify what you can say to an agent to get it to do the thing you want it to, because of these bespoke commands and functions.
Before doing this I think companies need to consider whether voice makes sense for how the user wants to engage with the brand and whether it would be useful form the user perspective.
In terms of customer services I think this is an area where conversational systems have truly taken off, but again they are currently limited in what they can actually achieve without human help. We may blame the AI agent for routing us to the wrong place or giving us the wrong information, but we’ll probably complain to a human agent at the other end, as complex queries are still best served through human conversation.
The differences in what we perceive a machine and a human to know and understand may also put a ceiling on what we would actually be comfortable for an agent to handle in this context.
Automative and healthcare applications are growth areas for speech and voice technologies. Where do you see the fit?
In automotive, voice is great as it allows you to multitask. You can attend to the road whilst also dictating a message, starting a call with someone or selecting a playlist.
Currently, voice systems generally take commands from the user. We may see this type of master-servant relationship change, especially with the development of autonomous vehicles.
Voice systems may become more like partners or helpers, telling you about road conditions or alerting you to times where you may need to take the wheel, taking the initiative in giving you this information, rather than waiting for you to ask for it.
Similarly in healthcare, we may see voice being used not only for command-based interactions, like requesting and booking appointments or sending updates to medical staff, but to get patients engaged in more conversational and social discussions to help with care.
Lots of projects right now are focused on identifying how to develop systems for healthcare, adding the ability for voice agents to have a chat with you. This might help people reminisce, combat loneliness or get people talking about their mood or mental wellbeing more openly.
It’s not clear whether this will be adopted widely, and it may not be seen as necessary or even appropriate in some medical contexts. Again, people know these systems are not human, so giving them human-like qualities may help in some cases but may be seen as a gimmick or totally inappropriate in others.
When developing a voice interface, to what extent does personalisation become a factor? Will assistants get to know our individual tone of voice and if so, who gets to hold on to that personally identifiable data?
Personalisation and privacy are very hot topics in the field at the moment. I think personalisation is something that, like with all other applications, really improves the user experience by making things easier and quicker to do. Yet, for this to work, data about you needs to be accessed by the agent – this is a common trade-off for many apps.
Because these agents are participants in conversations, we may however find it strange if they know lots about us, before we even talk to them. So we could use conversational mechanisms to build up knowledge over time, emulating the relationship building we do as humans.
At the moment personalisation seems to be a little blunt with these assistants, especially when it comes to knowing what we’ve said before in a conversation and how that might relate to what we are talking to the agent about at the present moment. This is improving.
The privacy issue is one that is gaining a lot of momentum in the field, especially as voice systems become more embedded in all parts of our lives.
Mostly, this data is used by companies to improve your experience or how your assistant performs. You can access this data from the major companies too. But for me it’s about whether we want the large corporations holding that data and whether we trust organisations holding recordings of our voice.
This is very different from other types of data as people see it as more personal and it is highly identifiable.
On top of this, smart speakers are effectively active microphones in the home, which may make people concerned as to the potential for recording data when they shouldn’t. The industry needs to respond to these concerns as they are creeping up the priority list for users.