Voice-activated assistants have become part of our lives via mobile devices (like Siri) and smart speakers (like Alexa). But what about the web? Picovoice’s Porcupine Wake Word engine, plus WebAssembly, now allows web browsers to run wake words using on-device voice AI - within web browsers. In order to showcase how a wake word works with a speech-to-text API, the Picovoice team built a Google Chrome Voice AI extension demo. So developers can see how to deploy a hotword triggering an extension.

A dedicated wake word engine is the only feasible way to achieve always-listening behaviour since it’s impractical—and a privacy nightmare—to continuously have a hot mic open to a cloud API. Without a phrase to trigger the application (i.e. trigger word), it’s not possible to offer an end-to-end hands-free experience as voice assistants Alexa or Siri does. This proof-of-concept Chrome extension that is powered by Porcupine Wake Word SDK for Web offers multiple wake word options to trigger Google search. Anyone can enjoy the search by voice with Google truly hands-free.

Extension options include multiple wake words

The Chrome voice extension is open source and available on GitHub. Although the extension is Chrome-only, the web SDK supports all modern web browsers including Safari, Firefox and Edge. It has additional packages for Angular, React, and Vue.

An open source starting point for voice web extensions

Speech-to-text is the most known approach for adding voice to the web due to the popularity of search by voice applications which require converting voice to text in real-time. However, it’s not the only approach. Wake word activated search is just a starting point. The WebExtensions API has a large list of features to control things like bookmarks and tabs or could be tailored toward a specific site like YouTube. Porcupine also offers the possibility of listening to many wake words simultaneously. Rhino Speech-to-Intent offers the ability to add a natural language capability for menu navigation and voice control for the web. Rhino processes the voice data offline, locally on-device and keeps the entire experience private. The Picovoice SDK for Web combines these two engines together to create a complete voice assistant loop for the browser.