By the autumn of 2018, voice assistants were everywhere. Amazon had expanded Alexa from the Echo speaker into a platform available in hundreds of third-party products. Google Assistant was running on Android phones and its own Home speakers and had demonstrated conversational capabilities that were broadly considered more sophisticated than its competitors. Siri was on hundreds of millions of iPhones, iPads, and Macs, giving Apple the largest installed base of any assistant.
The competitive situation was complex in a way that market share numbers did not fully capture. Amazon had the best hardware ecosystem and the most developed third-party integration platform. Google had the best underlying natural language understanding and the strongest connection to Google's search and knowledge infrastructure. Apple had the largest user base but had been slower to open Siri to third-party developers and had made product decisions that limited what Siri could do compared to its competitors.
What all three were struggling with was the same underlying problem. People were using voice assistants, but mostly for a narrow set of tasks. Setting timers, playing music, asking for weather forecasts, and controlling smart home devices accounted for the large majority of voice assistant interactions. The broader vision of a voice interface that replaced screens and keyboards for a wide range of tasks was not materialising at the pace that the investment in the technology suggested it should.
Part of this was accuracy. Voice recognition had improved dramatically, but it still failed often enough in real conditions, with accents, background noise, or unusual names, that many users had learned to expect occasional failures. Each failure created friction that pushed people back toward the screen.
Part of it was capability. Voice works well for narrow, well-defined tasks. It works less well when the answer to a question requires nuance, or when what you actually want is to browse options rather than state a precise request. The interfaces had been built around a model of discrete commands, but much of what people wanted to do with technology was not well modelled as a command.
The race had produced real improvements in natural language processing and new categories of hardware. What it had not yet produced was clarity about the killer use case that would make voice assistants essential rather than convenient for a limited set of things.