Private Voice AI with React

  • React
  • React Hooks
  • WebAssembly
  • NLU
  • Web Audio API
  • Smart Home
  • VUI
  • Wake Phrase

Introduction

Skill Level

  • Comfortable with JavaScript and React
  • Basic command line knowledge

Prerequisites

Smart Lighting Demo

“Picovoice, turn on the living room lights”

The Picovoice website features a smart lighting demo. The demo lets you control lights in a home using wake words and Speech-to-Intent. This tutorial recreates the demo using React. The code and models for the tutorial are available on GitHub.

We will be using two Picovoice products: Porcupine and Rhino. Porcupine is always listening, and used for hotword detection. Rhino is a Speech-to-Intent engine that performs natural language understanding (NLU) on voice data and infers intent, without intermediate speech-to-text. Together, these engines are the foundation for a Voice User Interface (VUI).

Porcupine will be responsible for listening for “Picovoice” in the idle state, to wake up the app. Once awakened, Rhino processes subsequent audio and extracts the intent from natural expressions like “turn on the kitchen lights”.

Application States

This state diagram shows the flow from Porcupine to Rhino.

Overview

To understand how Picovoice technology works in web browsers, see this article, which gives an overview of the WebAssembly, Web Audio API, Web Workers, and other details that serve as a foundation for this tutorial. In this tutorial our task will be to adapt the “vanilla” JavaScript always-listening smart lighting demo to be a React application.

Set Up

1. Create React App

Create a new single page application in React using Create React App. Run the following commands:

yarn create react-app smart-lighting
cd smart-lighting
yarn start

Your React application is now running locally in development mode. You can visit it at http://localhost:3000 in your browser.

2. Adding Picovoice Scripts to the Project

We will be using the files described in this article. This includes the WebAssembly modules, its wrapper and a binding file for both porcupine and rhino. For details on the purpose of each file, see that article.

2.1 Bindings

In the public folder, create a new folder called scripts. Here, we will place all our core binding and wasm files that allow us to use the WASM module. The files pv_porcupine.wasm and pv_porcupine.js can be found in the lib/wasm folder of the porcupine repo, while the files pv_rhino.wasm and pv_rhino.js can be found in the lib/wasm folder) of the rhino repo. We then need to get the binding files porcupine.js and rhino.js from the binding/javscript folders of the porcupine and rhino repo respectively.

2.2 Web Voice Processor and Downsampling Worker

We also need to add the web voice processor and downsampling worker to the project. Install the @picovoice/web-voice-processor library with yarn.

yarn add @picovoice/web-voice-processor

We will be importing web_voice_processor.js in smart-lighting/src/App.js

import "@picovoice/web-voice-processor/src/web_voice_processor"

We then have to move downsampling_worker.js from @picovoice/web-voice-processor in node_modules to the public/scripts folder.

2.3 Demo Code (Workers and Managers)

The JavaScript demo code in the rhino repo provides us with web workers for porcupine and rhino, as well as an overarching manager that connects them to the web voice processor. We will copy the porcupine worker script as well as the rhino worker script from the existing demos in the repos into our public/scripts folder. Finally, we have to add a copy of ppn_rhn_manager.js from the shared folder.

We use the public folder as a way to add assets outside of the module system. These scripts can be referenced with the environment variable PUBLIC_URL. The porcupine rhino manager needs to be added as a global variable so we can access it in our application code. Hence, we add the following line in the <head> of smart-lighting/public/index.html

<script src="%PUBLIC_URL%/scripts/ppn_rhn_manager.js"></script>

This is how your directory structure should look. You may have some additional files generated by Create React App but they are not relevant to this tutorial:

smart-lighting
├── public
│ ├── scripts
│ │ ├── downsampling_worker.js
│ │ ├── ppn_rhn_manager.js
│ │ ├── porcupine_worker.js
│ │ ├── porcupine.js
│ │ ├── pv_porcupine.js
│ │ ├── pv_porcupine.wasm
│ │ ├── pv_rhino.js
│ │ ├── pv_rhino.wasm
│ │ ├── rhino_worker.js
│ │ └── rhino.js
│ ├── index.html
│ ├── manifest.json
│ └ ...
├── src
│ ├── App.js
│ ├── App.css
│ └ ...
...

Please check out our article on Running Offline Voice AI Inside the Web Browser for details of these files.

3. Creating the SmartLightingDemo Class

In your src folder, create a new folder called picovoice for all Picovoice related modules. In picovoice, we will store a class called SmartLightingDemo that will be accessing the scripts in the public folder through PUBLIC_URL. The SmartLightingDemo encapsulates the VUI and all of its implementation details. We will use this class as the interface for our React application to interact with the VUI.

SmartLightingDemo Class

The class stores the porcupine rhino manager, keyword IDs, sensitivities and context as attributes. The startmethod has parameters for initCallback, ppnCallback and rhnCallback functions. The refresh method keep the callback closures up-to-date. The stop method releases the workers and halts the web voice processor.

4. Adding Wake Words and Context to SmartLightingDemo

With the Picovoice files and scripts in place, we can now supply our specific wake words and context so that the application listens for “Picovoice” and can perform NLU on the domain of smart lighting. These are captured as model files for the smart lighting wake words and context are available in the demo as “.ppn” and “.rhn” files, respectively. They are provided to the SmartLightingDemo class as arguments. We will encode these binary files into base64 strings, as it is a convenient format for embedding into JavaScript applications. In the example code, the encoding is done using Node, as part of the build process. We can also use the command line:

base64 smart_lighting.rhn
base64 picovoice.ppn

Store the base64 strings as constants in a separate file and import them into smart-lighting/src/picovoice/smart_lighting_demo.js. After which, add them into this.keywordIDs and this.context.

import { LIGHTING_CONTEXT } from "./lighting_context.js";
import { PICOVOICE_64 } from "./picovoice_64.js";
class SmartLightingDemo {
constructor() {
this.ppnRhnMgr = window.PorcupineRhinoManager
this.keywordIDs = {
picovoice: Buffer.from(PICOVOICE_64 , "base64"),
}
this.sensitivities = new Float32Array([0.6])
this.context = Buffer.from(LIGHTING_CONTEXT, "base64")
}

We also explicitly access the PorcupineRhinoManager via ‘window’, since the ppn_rhn_manager.js that we added as a script in the HTML file defines PorcupineRhinoManager as a global variable. This is the quickest way to adapt the existing scripts to work with our CRA code base. Alternatively, you could create module/ES6 versions of the PorcupineRhinoManager and others which could then be imported.

5. Instantiating SmartLightingDemo in App.js

SmartLightingDemo represents our Picovoice-powered VUI. Let’s connect that to our React application. Fundamentally this works by passing callbacks to the VUI that are invoked when certain voice events are detected.
Import the SmartLightingDemo class and create an instance in smart-lighting/src/App.js:

import SmartLightingDemo from "./picovoice/smart_lighting_demo"
const demo = new SmartLightingDemo()

SmartLightingDemo has start and stop functions. We can connect these to a React button in the GUI to turn the program on and off. Create startListening() and stopListening() functions that make use of the demo's start and stop function, as well as a higher level toggleListening() function. This will capture some additional application logic to e.g. reset state when the application is turned off.

const toggleListening = () => {
if (listening) {
stopListening()
} else {
startListening()
}
}
const stopListening = () => {
demo.stop()
setListening(false)
setWakePhrase(false)
resetDemo()
}
const startListening = () => {
demo.start(initEvent, keywordEvent, inferenceEvent)
setListening(true)
setWakePhrase(false)
setIntentFailed(false)
if (!demoInitialized) {
setDemoInitialized(true)
setDemoLoading(true)
}
}

Callbacks

Within the start function defined in the SmartLightingDemo class in smart_lighting_demo.js, we call the start function in PorcupineRhinoMananger and feed in three callbacks: the initialization callback, the porcupine callback and the rhino callback. These callbacks are fired when the relevant event messages are received from the porcupine and rhino workers.
Callbacks and Workers

initCallback: Engines are initialized and ready to process audio

Because we have to ask for permissions to use the microphone from the user, it is not possible to complete initialization on load. Additionally, the WASM module must load and finish initialization before it is ready to process speech data. Because these events are asynchronous with the page load, we need a mechanism to know if our application is loaded and ready for voice input, so that users do not attempt to speak into the void.

We want to give user feedback regarding the state of the demo, and notify them when it is ready to use. Create two states to keep track of whether the demo is loading, and whether it is initialized in smart-lighting/src/App.js.

const [demoInitialized, setDemoInitialized] = useState(false)
const [demoLoading, setDemoLoading] = useState(false)

Create an event handler called initCallback. This will be fired when the demo is initialized.

const initCallback = event => {
setDemoInitialized(true)
setDemoLoading(false)
}

ppnCallback: Wake Word Detection

We have created a handler for the initialization event. When the application is running, we need to be able to know when the user says the wake word, which in this application is "Picovoice". Create two states: one to keep track of whether the wake phrase has been detected, and another to determine whether the engine failed to detect an intent (when it’s successful, that will be handled by other logic).

const [wakePhrase, setWakePhrase] = useState(false)

Create an event handler called ppnCallback. This will be invoked when the wake phrase “Picovoice” is detected.

const ppnCallback = event => {
setWakePhrase(true)
setIntentFailed(false)
}

rhnCallback: Inference

After detecting a wake phrase, the user will say a command, like:

“turn on the living room lights”.

The manager will direct the audio processing input to the speech- to-intent engine. To capture the output of the engine and use it to invoke changes in our UI, we create rhnCallback. Sometimes, users will say a command that is not captured by the Speech-to-Intent context. For example, if you ask it to tell you a joke, which the smart lighting context is not designed to recognize. In this case, the engine will report that it did not understand. We’ll record this with our intentFailed state. Create an event handler called inferenceEvent, which will be fired whenever the rhino engine returns the intent detected. This function is more complex than the others, as the context allows many possible intent permutations.

const rhnCallback = (information) => { ... }

information has the following structure:

{
"isUnderstood": true,
"intent": "changeIntensity",
"slots": {
"location": "living room",
"intensity": "up"
}
}

The keys “isUnderstood”, “slots”, and “intent” will always be present in the output. The structure of the slots and the names of the intents will depend on the context (slots capture specific variables from speech, in this case e.g. which room to change the lights in).

This particular output is specific to the smart lighting demo. See this file to understand the complete structure of the lighting context. The context is also available as a starting template “Smart Lighting (Advanced)” for Picovoice Console, allowing you to make changes and as test them in the Console as you iterate.

setWakePhrase(false);
if (information.isUnderstood === false) {
setIntentFailed(true);
} else {
setIntentFailed(false);

Once we detect an intent, we reset wakePhrase to false, as the manager has switched audio input back to the wake word engine and is ready to detect new events. Now we can use the results of the speech-to-intent engine and make the application logic reflect the user’s intent.

const intent = information.intent
const slots = information.slots
const location = slots["location"] === undefined ? "all" : slots["location"]
if (intent === "changeLightState") {
const state = slots["state"] === "on" ? true : false
STATE_MAP.get(location)(state)
} else if (intent === "changeLightStateOff") {
STATE_MAP.get(location)(false)
} else if (intent === "changeColor") {
const color = LIGHT_MAP.get(slots["color"])
COLOR_MAP.get(location)(color)
if (location !== "all") {
STATE_MAP.get(location)(true)
}
} else if (intent === "changeIntensity") {
const dir = DIRECTION_MAP.get(slots["intensity"])
changeIntensity(location, dir)
} else if (intent === "reset") {
const resetAspect = RESET_MAP.get(slots["feature"])
if (location === null || location === undefined) {
if (resetAspect === ASPECT_COLOR) {
setAllLightColor(LIGHT_YELLOW)
} else if (resetAspect === ASPECT_INTENSITY) {
setAllLightIntensity(1)
}
} else {
if (resetAspect === ASPECT_COLOR) {
COLOR_MAP.get(location)(LIGHT_YELLOW)
} else if (resetAspect === ASPECT_INTENSITY) {
changeIntensity(location, "reset")
}
}
}

Here is where the magic happens: intent handling. We have five intents that need to be handled differently: changeColor, changeIntensity, changeLightState, changeLightStateOff, reset. Within each intent, different functions have to be invoked, depending on the particular slots.

We store the different functions in a map, and get them based on the location in the slots and then invoke them here. The maps store functions that alter the state of the App component. There are many moving parts and many many states to keep track of which are beyond the scope of this tutorial. See the source to understand how it works in more detail.

Using the Effect Hook With The Event Handlers

Some intents may need access to the application state. For example, for the command:

"brighten the light in the living room"

We first have to check if the current intensity of the living room is at maximum. The inferenceEvent function will capture the initial values of the state, but not the new values that come from subsequent updates. The manager will be calling a function with stale state and the logic will break.

We need to refresh the callbacks whenever there is a state change using the React Effect hook. The Effect Hook lets you perform side effects in function components. By default, it runs on render and on every update. We create an effect hook with useEffect, and it takes in two arguments: the function to be run, and a dependency array.

useEffect(() => {
demo.refresh(initEvent, keywordEvent, inferenceEvent)
})

We do not pass a second argument, so that the effect hook runs on every update.

Adding Some Style

First, we create css definitions for the two different states:

.house .lights {
fill: none;
stroke: none;
}
#living-room.on {
fill: url(#livingRoomGradient);
}

And then make the className of the corresponding component be dependent on a state.

<rect
x="0"
y="0"
width="224"
height="224"
className={`lights${livingRoomLightState ? " on" : ""}`}
id="living-room"
/>

What's Next?

  • Use Picovoice Console to modify the context. Create a new context and use the “Smart Lighting (Advanced)” drop down.