The other day, I was lucky enough to be visiting one of the big corporates of India and look at the functions up close. What struck me the most in my time there was, how much did a person at the position of responsibility, depended on his PA or assistant. Right from scheduling meetings, to booking his tickets, to sometimes even answering his phone calls, it was like the assistant was the third arm that got him through the day.
It was this behavior that got me thinking about the virtual assistant that we have on our phones. You do not need to be an Area Manager of a DGM to warrant an assistant as pretty much every single smartphone today comes with a Virtual Assistant, that at least does try and mimic everything that a real-life assistant would do for you. However, be it Siri, Cortana or even Google Now, the world of virtual assistants has not been a rosey place that we expected it to be. You still look dorky if you are caught talking on your phone and despite all the hype around Siri, it really must be one of the least used features on an iPhone today just ahead of a bundle of apps like Quick Tips that nobody bothers with.
So what is it that has prevented the adoption or what do the makers of Virtual Assistants still need to do, in order to make sure that these pseudo assistants are actually being used? Let’s look into it.
Accent recognition and not just voice recognition
The major drawback with these virtual assistants seems to be their inability to gauge accents. It is a big challenge as in a country like India where multiple local languages exist, the accents vary from one region to the other and how. What the makers of these virtual assistants need to crack is how well can they understand these varied accents. Then there is a problem of mix and match language, where a ton of people in India speak what is called Hinglish, a right mix of English and Hindi and I am sure there are a ton of such mixtures that exist globally. Naturally, unless you are an American or maybe from the Britain, where again accents are really varied, virtual assistants will have to understand accents and context and not just languages for them to be used by every Joe and Jim in the world.
Working with Third Party Apps
Virtually every assistant out there works with a rather closed set of applications that are on their roster or signed up for. For example, Siri only works with apps that make use of Homekit, or Google Now works with limited apps like WhatsApp only. But what if I want my virtual assistant to open a lesser known social networking app and push out a status message? Yes, it would not work. Or if I simply told it to go ahead and book a ticket for a movie using a local cinema app? Cortana does sort of work with a larger range of third party apps, but let’s be honest, how many of us even use a Windows Phone? A ray of hope in solving this problem is Viv, a newly launched AI virtual assistant, which works with third party apps. However, unless something like Viv is going to be embedded in the OS and a solution right out of the box, it may struggle for the right kind of traction for popularity.
Major Learning Curve
The Virtual Assistants remain a major learning curve for just about everyone who is using them. In fact, just launching them is a bit of a juggernaut. In the case of Google, you need to tap on the microphone in either Google Now or the search widget, whereas on iPhones, you need to long press the home button. On a few Samsung phones, you long tap the Home button and are taken to S-Voice while some other devices take you to Google Now. Once you do manage to get the assistant up and running, you still need to be aware of its limitations for things to work out. For example, the other day I tried asking Siri the value of log 27, something I know Google Now has always given to me without breaking a sweat. Alas, to my surprise Siri only threw back at me some web results rather than the numeric value it self. At every step of using the virtual assistants today, you need to know about it just as much as it needs to know about you. This does not feel natural, as you basically need to learn new keywords, hot words or even what your assistant is capable of doing.
Not Truly Hands-Free
‘Hello Moto’ was wonderful. In fact, there have been several hot word detections that we seen since, including ‘Hey Siri’. You would assume this would mean that I could call on my assistant anytime I want to, in order to get things done. But in real life it barely works and you still have to pick your phone and bring it close to your mouth and talk. Hey Siri for that matter only works without power on very selected phones like iPhone SE or the latest iPhone 6s, oh and it doesn’t work on the larger iPad Pro but does on the iPad Pro 9.7. I might as well just pick it up and type my search on the desktop rather than use a Virtual Assistant. The other scenario is if my phone is kept far away when charging. Here too, highly unlikely that the microphone will be able to pick my voice from so far off, and I am still forced to go to the phone, pick it up, bring it close to me and wake the assistant. For Virtual Assistants to be giving a truly hands-free experience, you would expect them to be trained in loud noisy environments and improve the microphones drastically so that the call to the virtual assistant could be answered. Even with Hotword detection on, put me about 50 cm away from the phone and it would not hear me, I bet if I did have a proper assistant, she would. Oh, and to follow up a question on Siri, you still have to tap the little microphone button, that is akin to me pinching my assistant everytime I want to ask another question. It’s just not natural.
Largely high-speed internet dependent
Pretty much every virtual assistant out there relies on a good internet. In fact, Siri and Google Now cannot even be fired up if there is no internet. But what if I just wanted my assistant to dim the backlight on the phone or simply turn the Wi-Fi on or mobile data on, why would you need the internet for that? What if I just wanted to quickly dictate a note or a meeting request to my assistant? It is virtually useless until I have a working internet. In India or any developing country, the internet is still seen as a facility rather than a basic necessity. The networks are sketchy and you barely have decent connectivity, forget the fast internet to be able to do the computation that virtual assistants need. Therefore, for the assistants to become really useful, they must be independent and not be connection dependent. In fact, if you see, your assistant saves a ton of data in the cache to enable machine learning, so why could maybe that data just not be used here?
Trust issues and Data Accuracy
A virtual assistant learns over time depending how much of permissions you grant it and how often you use it. The other day a colleague had a Jet Airways flight to catch and Google Now started displaying the flight number and schedule in Google Now. He was taken aback and was scared that how did Google come to know about this. His immediate reaction was, that Google was checking his e-mails and therefore would also be checking the private mails that are in there. He was not comfortable with the same and immediately opted out. It is the same with Cortana, which learns about you over a period of time. Not everyone is very comfortable with a computer knowing too much about them. The other trust issue I would have is whether the information that the virtual assistant is giving me is absolutely accurate results. For example, the Weather results on Siri are powered by The Weather Channel, or for that matter Google Flights pulls out the results when I ask for flights between two destinations. So, primarily, not only am I trusting an assistant but also its partners. Partners are always formed on the highest monetary benefit or best Cost per Sale that the host can earn, does that not vindicate some personal interest rather than sworn oath to just showing the most accurate data?
Understanding King’s English
Google did address the need to understanding emotions and pitches in their IO yesterday and plans to bring some part of it in Google Assistant. What Google failed to recognize is that if I am dictating to Google a message to be sent as a text or even as a WhatsApp, I am sure to use some punctuation marks which could easily change the context of the conversation if they are not used. Take a couple of pauses while dictating to Google Now, and you still are left with a plain blank sentence. It’s like some of these assistants did not go to a grammar school. Say words like ‘Full Stop’ or ‘Comma’ and often you will see they are typed out rather than placed. Plus, if I am asking a friend of mine for his plan tonight, is it not obvious, there must be a question mark in there? Virtual assistants are still far off from understanding the emotion, the tone and the pitch of our conversation and therefore when they put it down, via a Speech to text module, they are often mistaken. This is where you again have to go back to your trusty old keyboard or just end up looking like someone who could not put two sentences together.
We are very sure that plenty of points that we pointed out will be taken care of in the near future, in fact, hearing Google I/O Keynote yesterday, you feel that Google at least already acknowledges a lot of it. However, until a solid action is taken on these, virtual assistants will continue to remain a gimmicky feature, which sure you would love on your phone but would never use it. We have seen plenty of features like NFC go the same route, but we hope that the story here is different and we can actually form a more humane connection with our phone, more than anything else.
This is a guest post by Arpit Verma.