Picovoice speech-to-text brings cloud-level accuracy to edge devices

September 3, 2019 (VANCOUVER, BRITISH COLUMBIA) - Picovoice speech-to-text brings cloud-level accuracy to edge devices.

Picovoice, a Canadian AI startup, has developed the world’s first real-time speech recognition engine that can run offline anywhere, from a $5 Raspberry Pi Zero to within a web browser.

Voice user interfaces have flourished backed by cloud-based Speech recognition services. Despite the ubiquity of speech-enabled devices, processing speech in the cloud has raised major privacy concerns with the uploading and handling of personal voice data. Cloud speech recognition also has fundamental limitations in terms of latency, reliability, and cost-effectiveness at scale.

Offline speech recognition has the potential to address these issues by eliminating the need for connectivity and tapping into readily available compute resources on billions of devices. Alas, the computational cost of speech recognition algorithms to date has made it impossible to get comparable accuracy on an edge device.

Picovoice has developed deep learning technology specifically designed to run speech recognition efficiently on commodity hardware with limited compute resources. Its bespoke voice AI technology enables Picovoice to run real-time speech-to-text on a $5 Raspberry Pi Zero or locally within a web browser. This lowers latency and cost while respecting user privacy by not requiring their speech data to leave their device.

“Edge-based voice recognition is the next natural step in the architectural evolution of voice interfaces. A similar trend has already transformed the architecture of web applications, shifting functionality from servers to browsers. This has assisted in building applications that are responsive, context-aware, and scalable. I believe voice as an interface will go through the same evolution,” Alireza Kenarsari, Picovoice founder, said.

He added: “Some consider speech recognition a solved problem. That might be true if you have infinite compute resources. Limited compute resources available on edge devices is the main hindrance for making voice AI private, responsive, and cost-effective. Picovoice’s charter is to accelerate the transition to the edge. In doing so we had to revisit and question current approaches to speech recognition and develop new paradigms for applying deep learning.”

Picovoice routinely publishes open-source benchmarks for their products including their recent speech-to-text engine. The benchmarks indicate that the software is matching the accuracy of major cloud providers such as Google and Amazon while running locally on a small embedded device.

Picovoice software is used by dozens of enterprise licensees, including Fortune 500 companies. LG, Whirlpool, and Local Motors are among those that they can mention. Picovoice technology has already received immense interest from enterprises who are trying to push AI to the edge.