Voice Control

Advances in machine learning, speech recognition, and natural language understanding will drive the development of virtual assistants and bots that act more and more like people, controlled by and responding with human voices and fulfilling search queries, acting as proxies, accomplishing tasks, and asking questions of us in return. [1]

How It’s Developing

Voice-control provides a new option for interacting with computers and technologies, part of an evolution from computer languages and typed commands, to more graphical user interfaces, to touch screens, and now gesture control and voice. [2] Faster wireless speeds and the proliferation of smartphones provided the perfect initial spaces for these virtual assistants to be deployed to consumers.

Siri marked one of the first mass-market voice-controlled assistants. Prior to its acquisition by Apple, the co-founders of Siri sought an entirely new paradigm for accessing the Internet, allowing artificially intelligent agents to summon composed answers from multiple sources, rather than pull relevant resources for humans to consult on their own – a move from a search engine to a “do engine.” [3] When asked a question, Siri would send the audio of the speaker’s question to a server where speech recognition software would “transcribe” the spoken words and then map the contents of a question onto a domain of potential actions before picking the action that seemed most probable, based on its understanding of the relationships between real-world concepts – Siri could also apply details about the time of day and a user’s preferences and location to inform its response, or to ask for more information. [4] Consumers first experienced Siri on iPhones and with the introduction of Apple’s HomePod, Siri has been deployed in a speaker and as an in-home assistant, offering music and podcasts as well as messages, weather, traffic, sports, and alarms. [5

Google’s Google Now virtual assistant evolved into an upgraded Google Assistant service that can field individual and follow-up questions and understand the conversation and return the right answer. [6] Along with voice control, Google Assistant also works in a text chat form, allowing Google to deploy Google assistant across multiple devices, including phones, its Allo chat bot app, its voice-controlled Google Home speaker, and numerous smart home devices – as of 2017, Assistant was said to be integrated into more than 100 million individual devices, including smart TVs, automobiles, and wearables. [7]

Amazon’s Echo device, first introduced in 2015, is an always listening device designed to play music and answer basic household questions when activated with a wake word. While Amazon started further behind Apple and Google in the area of voice control, it benefited from its consumer reach – selling nearly 1 million devices during the 2015 holiday season – and the independent developers that wrote apps to work with the speaker’s voice controls, allowing the device to control other smart home devices, connect to apps, and perform a growing number of tasks. [8]

Apple, Google, and Amazon are eager to see this technology spread, with new skills added to their platforms and the technology integrated into other devices. In 2015, Amazon gave developers the opportunity to build new capabilities for Echo’s Alexa through the Alexa Skills Kit – by February 2017, Alexa had over 10,000 skills up from 7,000 in January and just 1,000 in June 2016. [9] Amazon also opened its Voice Processing Technology (along with technologies for wake word recognition, beamforming, noise reduction, echo cancellation) to third party hardware makers interested in building Alexa into their devices. [10] Google, already having Google Assistant integrated into numerous devices, made a Google Assistant SDK available for manufacturers to build the Google Assistant into any hardware. [11]

As the skills for voice-controlled technology expands, the speech technology that powers them also improves. Amazon’s Speech Synthesis Markup Language has expanded to allow Alexa to whisper, vary its speaking speed, bleep out words, add pauses, change the pronunciation of a word, spell a word out, add audio snippets, and insert special words and phrases. [12]

Product developers are eager to expand the reach of voice-controlled technology across audiences. Toy-maker Mattel introduced Aristotle, a $349 voice-activated speaker built for children and families that can begin as a smart baby monitor with a camera that streams video to parents’ phones, audio that can help sooth crying babies, and even tracking functions that can monitor feedings and changings to more seamlessly replenish baby products – Aristotle is also able to adapt as a nanny, friend, and tutor for older children, programmed to understand young voices so it can introduce games for toddlers and field homework questions for school-age children. [13]

Marketers see voice-controlled technology as an opportunity to provide more information to consumers. Amazon’s “Notifications for Alexa” feature, while opt-in based on each user’s preferences, would proactively alert users with information that’s deemed important to them, including breaking news and random weather reports. [14] In a particularly bold move, Burger King launched a TV commercial that attempted to wake up Google Home devices to expand the reach of the advertisement even after it was over – the commercial was launched without coordination from Google and Google quickly moved to limit the advertisement’s effect and reach. [15] As the technology providers open the platform up to developers, some will likely seek ways to monetize their skills, including the introduction of “sponsored messages” inserted into device responses – while Amazon’s developer agreement forbids “any advertising for third party products or services” for apps unrelated to music streaming, radio, or news briefs, the growth of Skills will likely make this difficult to monitor. [16]

As voice-controlled technology becomes more integrated into homes, it will adapt to recognize multiple family members and residents. Google adjusted its Google Home assistant to allow for multiple users, each of whom can be uniquely identified by their voice – while convenient, such features also make clear the ability for these devices to more accurately track searches, requests, and directions to specific individuals based on their voice. [17] Amazon is also reportedly pursuing a feature that would allow Alexa to distinguish between individual users based on their voices. [18]  

Why It Matters for Libraries?

As voice-controlled devices become more popular, they will likely become a more readily available tool for reference. In 2015, 65% of smartphone owners reported using voice assistants like Apple’s Siri, a steady growth from prior years – and in her 2016 Internet Trends Report, Mary Meeker estimated that half of all web searches will be conducted through voice and image searches within the next four years. [19] Content integration will likely accelerate this trend. Google announced partnerships with Bon Appetit, The New York Times, and Food Network to make step-by-step, voice-activated guides for more than 5 million recipes available through the Home speaker – users will still need to search for and save recipes to their device, but once saved the instructions are conveniently available via voice command. [20]

As users increasingly accept the responses produced by voice-controlled technology, there may be concerns for the relevance and authority of the information pulled for these responses. In a conversation interface, users will not always have the option of sorting through multiple possible responses (as they would in a web search) or of immediately knowing the source of the information provided or seeing some of the details that might alert them to problems with the information – the technology simply picks the programmed source for news, reference, etc., and conveys it to the listener, with some options for customization of sources built into the app. [21]      

Voice-controlled technology could also change the way people access and “read” content. While still in its early stages, The Washington Post is experimenting with Amazon’s Polly technology that produces audio transcriptions of text, making audio versions of four articles available daily. [22] While currently only available on mobile devices, there cold likely come a time when users would be able to make a voice request for specific content – like the business section of a newspaper, a website, or even whole books – to be retrieved and read aloud by a voice-controlled device.

Children and young people will grow with voice-controlled technology, becoming more accustomed to having these devices answer homework questions, settle disputes, and entertain them – all of these could have an impact on their social, interpersonal, and language development as well as their intellectual development, moving them toward more simplistic inquiry and acceptance of simple answers instead of taking on more complex questions and answers. [23]  

These voice-controlled virtual assistants could become intellectual equalizers, substituting in for a superb memory or as an on-hand reference. [24] In such a world, how will humans find valuable ways to work together instead of working in isolation with their voice-controlled assistant?  

While voice-controlled devices could be a tool for education and learning, voice-control could also become an increasingly important area of research and technological development. Amazon initiated an Alexa Fund Fellowship to fund and support researchers working on voice technology at Carnegie Mellon University, Johns Hopkins University, University of Southern California, and University of Waterloo. [25]

Voice-controlled technology may increasingly appear in public and shared spaces. Several hotels, including Marriott and Wynn Resorts, are testing devices from Apple and Amazon in hotel rooms to help guests turn on lights, close drapes, control room temperature, and change television channels via voice command. [26] The use of these devices in semi-public spaces could raise privacy concerns as guests toggle between personal accounts and more standardized accounts set to the specific space. [27] It could also change users’ expectations for what they can do in public and shared spaces.   

Concerns for privacy might also arise over the private exchanges overheard by voice-controlled speakers. This information became a central focus in a murder investigation in Arkansas when police asked Amazon for data that may have been recorded on its Echo device while a murder was taking place – while the device typically sits in an idle state with its microphones listening for key words like “Alexa” before it begins recording and sending data to Amazon’s servers, it’s not unusual for the Echo to wake up by mistake and grab snippets of audio, leading investigators to request the data in the event the speaker overheard key events. [28] Amazon refused to hand over data, claiming that the data and the responses from the voice assistant itself were protected by the First Amendment, but the defendant ultimately agreed to allow Amazon to forward his Echo's data to prosecutors, leaving the legal standard for when data from an Echo or other Internet of Things devices can be used in a court of law unanswered. [29]

In addition to privacy concerns, there might be concerns over disruption. Amazon introduced a voice calling and voice messaging feature to their Echo devices meant to increase the convenience and access for sending and receiving voice communications, but the initial launch came without an option to block contacts from calling them when the feature is enabled, providing any number of contacts with direct access into even private spaces where the device is connected. [30]      

Voice-controlled technologies could form a complex relationship around issues of diversity. Many of the virtual assistants carry female-sounding names and use female voices by default, perpetuating notions of female servitude and societal sexism. [31] Additionally, voice-controlled technologies defer to the most standard forms of speech, making regional accents, cultural syntax, and correct foreign pronunciations problematic and perhaps also challenging new speakers of a given language – the technologies also push more speakers to adopt a “machine” voice that is different from their regular speaking voice when engaging with friends and family. [32]

At the same time, voice-controlled technologies could provide benefits to specific portions of the population, including individuals with disabilities or older adults who could benefit from voice assistants to control their homes, order groceries, provide reminders and notifications, or more easily access digital content. [33]

As more players enter this space – especially big players like Apple, Google, and Amazon – voice-controlled products carry the potential for fragmentation as certain services (iTunes, Gmail) integrate with only certain devices. [34]

Notes and Resources

[1] "Terrifyingly convenient." Will Oremus. Slate. April 3, 2016. Available from http://www.slate.com/articles/technology/cover_story/2016/04/alexa_corta...

[2] "Terrifyingly convenient." Will Oremus. Slate. April 3, 2016. Available from http://www.slate.com/articles/technology/cover_story/2016/04/alexa_corta...

[3] "Siri rising: The inside story of Siri’s origins — and why she could overshadow the iPhone." Bianca Bosker. The Huffington Post. January 22, 2013. Available from http://www.huffingtonpost.com/2013/01/22/siri-do-engine-apple-iphone_n_2...

[4] " Siri rising: The inside story of Siri’s origins — and why she could overshadow the iPhone." Bianca Bosker. The Huffington Post. January 22, 2013. Available from http://www.huffingtonpost.com/2013/01/22/siri-do-engine-apple-iphone_n_2...

[5] "Apple’s HomePod puts Siri in a speaker." David Pierce. Wired. June 5, 2017. Available from https://www.wired.com/2017/06/apple-homepod/

[6] "Google unveils Google Assistant, a virtual assistant that’s a big upgrade to Google Now," Matthew Lynley. TechCrunch. May 18, 2016. Available from https://techcrunch.com/2016/05/18/google-unveils-google-assistant-a-big-...

[7] "Google Assistant is about to be everywhere." Andrew Tarantola. Engadget. May 17, 2017. Available from https://www.engadget.com/2017/05/17/google-assistant-is-about-to-be-ever...

[8] "The real story of how Amazon built the Echo." Joshua Brustein. Bloomberg. April 19, 2016. Available from https://www.bloomberg.com/features/2016-amazon-echo/

[9] "Amazon opens up Alexa’s microphone and voice processing technology to hardware makers." Nat Levy. GeekWire. April 13, 2017. Available from https://www.geekwire.com/2017/amazon-opens-up-alexas-microphone-and-voic...

[10] "Amazon opens up Alexa’s microphone and voice processing technology to hardware makers." Nat Levy. GeekWire. April 13, 2017. Available from https://www.geekwire.com/2017/amazon-opens-up-alexas-microphone-and-voic...

[11] "Google Assistant is about to be everywhere." Andrew Tarantola. Engadget. May 17, 2017. Available from https://www.engadget.com/2017/05/17/google-assistant-is-about-to-be-ever...

[12] "Amazon’s Alexa can now whisper, bleep out swear words, and change its pitch." Ashley Carman. The Verge. May 2, 2017. Available from https://www.theverge.com/circuitbreaker/2017/4/28/15475070/amazon-alexa-...

[13] "Mattel's new AI will help raise your kids." Mark Wilson. Fast Company. April 17, 2017. Available from https://www.fastcompany.com/40400777/mattels-new-ai-will-help-raise-your...

[14] "Amazon’s Alexa is getting smarter, but potentially more intrusive." Alejandro Alba. Vocativ. May 16, 2017. Available from http://www.vocativ.com/430289/amazons-echo-alexa-notifications-intrusive/

[15] "This Burger King ad forces your Google Home device to tell you about Whoppers." Mary Beth Quirk. Consumerist. April 12, 2017. Available from https://consumerist.com/2017/04/12/this-burger-king-ad-forces-your-google-home-device-to-tell-you-about-whoppers/

[16] "Amazon’s Alexa may soon throw ads into its responses." Allee Manning. Vocativ. May 12, 2017. Available from http://www.vocativ.com/429414/amazon-echo-alexa-ads/

[17] "Google Home now recognizes specific users’ voices, allows for multiple accounts." Chris Moran. Consumerist. April 20, 2017. Available from https://consumerist.com/2017/04/20/google-home-now-recognizes-specific-u...

[18] "Exclusive: Amazon developing advanced voice-recognition for Alexa." Lisa Eadicicco. Time. February 27, 2017. Available from http://time.com/4683981/amazon-echo-voice-id-feature-2017/

[19] "Shouting at your computer is the future of search." Allee Manning. Vocativ. June 3, 2016. Available from http://www.vocativ.com/325393/voice-commands-are-the-future/

[20] "Google Home is upping your cooking game with 5 million new recipes." Brett Williams. Mashable. April 26, 2017. Available from http://mashable.com/2017/04/26/google-home-recipes/#xlWLIK00IOqf

[21] "Terrifyingly convenient." Will Oremus. Slate. April 3, 2016. Available from http://www.slate.com/articles/technology/cover_story/2016/04/alexa_corta...

[22] "WaPo is testing audio articles with Amazon tech." George Slefo. Advertising Age. June 9, 2017. Available from http://adage.com/article/digital/readers-listen-articles-washington-post...

[23] "How millions of kids are being shaped by know-it-all voice assistants." Michael S. Rosenwald. The Washington Post. March 2, 2017. Available from https://www.washingtonpost.com/local/how-millions-of-kids-are-being-shap...

[24] "Siri rising: The inside story of Siri’s origins — and why she could overshadow the iPhone." Bianca Bosker. The Huffington Post. January 22, 2013. Available from http://www.huffingtonpost.com/2013/01/22/siri-do-engine-apple-iphone_n_2...

[25] "Amazon establishes Alexa Fund Fellowship to support universities researching voice technology." Nat Levy. GeekWire. March 2, 2017. Available from https://www.geekwire.com/2017/amazon-establishes-alexa-fund-fellowship-t...

[26] "Siri and Alexa are fighting to be your hotel butler." Hui-yong Yu and Spencer Soper. Bloomberg. March 22, 2017. Available from https://www.bloomberg.com/news/articles/2017-03-22/amazon-s-alexa-takes-its-fight-with-siri-to-marriott-hotel-rooms

[27] " Siri and Alexa are fighting to be your hotel butler." Hui-yong Yu and Spencer Soper. Bloomberg. March 22, 2017. Available from https://www.bloomberg.com/news/articles/2017-03-22/amazon-s-alexa-takes-its-fight-with-siri-to-marriott-hotel-rooms

[28] "Should an Amazon Echo help solve a murder?" Michael Reilly. MIT Technology Review. December 27, 2017. Available from https://www.technologyreview.com/s/603278/should-an-amazon-echo-help-sol...

[29] "Did Alexa hear a murder? We may finally find out." David Kravetz. ArsTechnica. March 7, 2017. Available from https://arstechnica.com/tech-policy/2017/03/did-alexa-hear-a-murder-we-may-finally-find-out/

[30] "Amazon is bringing voice calls to the Echo." David Priest. CNET. May 9, 2017. Available from https://www.cnet.com/news/amazon-is-bringing-voice-calls-to-the-echo/

and

"Amazon says caller blocking for Alexa/Echo is coming, amid customer complaints." Todd Bishop. GeekWire. May 13, 2017. Available from https://www.geekwire.com/2017/amazon-says-caller-blocking-alexaecho-comi...

[31] "Terrifyingly convenient." Will Oremus. Slate. April 3, 2016. Available from http://www.slate.com/articles/technology/cover_story/2016/04/alexa_corta...

[32] "Y'all have a Texas accent? Siri (and the world) might be slowly killing it." Tom Dart. The Guardian. February 10, 2016. Available from https://www.theguardian.com/technology/2016/feb/10/texas-regional-accent...

[33] "How millions of kids are being shaped by know-it-all voice assistants." Michael S. Rosenwald. The Washington Post. March 2, 2017. Available from https://www.washingtonpost.com/local/how-millions-of-kids-are-being-shap...

[34] "Google Home is cool, but catching up to Amazon Echo won’t be easy." Brian Barreett. Wired. May 19, 2016. Available from https://www.wired.com/2016/05/google-home-amazon-echo/