Offline Voice AI in a Web BrowserJuly 09, 2020
Why offline when you're online?
Offline Voice AI in a browser may seem contradictory. Doesn’t using a web browser mean you’re online? But even if you’re connected to the Internet, running Voice AI offline, in-browser, means eliminating both minimum and variable latency, and that the speech data is intrinsically private: it does not need to leave the device.
Offline AI also allows the voice functionality to work consistently with progressive apps. And from a business perspective, it can let you determine cost based on customer seats instead of unbounded number of calls to a cloud provider.
Demo Application: Smart Lighting
To start any command, say "Pico Home" to wake the device.
Here are some example phrases:
"Turn the lights in the Living Room on"
"Shut off the lights in the Bathroom"
"Switch on all lights"
Here are some example phrases:
"Turn the lights in the Kitchen Blue"
"Make the closet lights green"
"Switch all lights purple"
All of the speech processing is performed privately, offline, in-browser. You can see more web demos here.
How to embed Voice AI in the browser
To achieve offline voice AI, we can use speech technology from Picovoice. Picovoice engines run in modern web browsers via WebAssembly. A typical use case involves providing the engines real time audio input from the microphone, and receiving events back (e.g. hotword detected).
The core of running any Picovoice engine in the browser is simply three files: a WebAssembly module, its wrapper, and a binding file. The binding provides a factory method for creating an instance of the engine. The engine fires a callback to signal that the WASM has finished loading, accepts frames of audio, and returns the engine results.
We will expand on this foundation with additional steps to setup streaming audio, request microphone permission, and downsample audio to the required format. We’ve provided demo code to assist with those tasks.
Web Audio API and Microphone access
Picovoice uses the Web Audio API to process streaming audio. The MediaDevices API is used to access the microphone. The user must explicitly grant permission to access the microphone before any audio processing can occur. This is mandatory due to browsers’ security model. Additionally, the browser requires an HTTPS connection to allow microphone access (for testing locally, HTTP will suffice).
Web Voice Processor
Picovoice engines accept industry standard 16kHz audio for speech processing. When accessing microphone audio, we need to downsample it to this format. We also need to ask for permission from the user to access their microphone, and connect the input to the downsampler. To handle these tasks we have provided the Web Voice Processor package. It is also available on npm.
The demo code includes Porcupine and Rhino web worker scripts. They communicate with managers in the main thread using
postMessage(). In addition, the Web Voice Processor performs downsampling via its own dedicated worker.
Open Source Demo Applications
We have provided demo code for the Porcupine and Rhino engines (including running both together) that will run in a recent web browser without transpilation required.
The Porcupine demo allows you to control a smart lamp with always-listening commands.
The Smart Lighting demo uses Rhino to handle smart home lighting with natural language commands like “dim the living room lights”. There’s also an always listening version of this demo that uses Porcupine as a trigger to wake the smart home lighting and then use the natural language commands.
- Porcupine Smart Lamp
- Rhino Smart Home Lighting (standalone push-to-talk)
- Rhino Smart Home Lighting (always-listening using Porcupine)
How do I use Picovoice engines with React, Angular, Vue, …?
Fundamentally, the engines do not care which framework you are using. That said, we have an open source sample of how to use Picovoice with React in the browser with a companion tutorial that explains the details.
What about ES6 and modules? How do I use this with e.g. Webpack?
Alternatively, you can simply add the scripts as-is to the HTML document head using something like React Helmet, and provide them in a static/public folder with your web application.
Which Picovoice Engines run in the browser?
All of them.
|Porcupine||Wake word detection|
|Cheetah||Speech-to-text (real time; live feedback)|
|Leopard||Speech-to-text (file-based; accuracy boost)|
|Octopus||Speech-to-index (voice search)|
How do I create custom wake words or speech-to-intent contexts for the web?
You can use the Picovoice Console to create wake words and design and train speech-to-intent contexts. Personal use accounts are free and enterprise accounts are available with a 30-day trial.