Private Voice AI with React
Introduction
Skill Level
- Comfortable with JavaScript and React
- Basic command line knowledge
Prerequisites
- yarn (or npm, using equivalent commands)
- create react app
- modern browser and microphone

“Picovoice, turn on the living room lights”
The Picovoice website features a smart lighting demo. The demo lets you control lights in a home using wake words and Speech-to-Intent. This tutorial recreates the demo using React. The code and models for the tutorial are available on GitHub.
We will be using two Picovoice platform features: Porcupine and Rhino. Porcupine is always listening, and used for hotword detection. Rhino is a Speech-to-Intent engine that performs natural language understanding (NLU) on voice data and infers intent, without intermediate speech-to-text. Together, these engines are the foundation for a Voice User Interface (VUI).
Porcupine will be responsible for listening for “Picovoice” in the idle state, to wake up the app. Once awakened, Rhino processes subsequent audio and extracts the intent from natural expressions like “turn on the kitchen lights”.
This state diagram shows the flow from Porcupine to Rhino.
Overview
To understand how Picovoice technology works in web browsers, see this article, which gives an overview of the WebAssembly, Web Audio API, Web Workers, and other details that serve as a foundation for this tutorial. In this tutorial our task will be to adapt the “vanilla” JavaScript smart lighting demo to be a React application.
Set Up
1. Create React App
Create a new single page application in React using Create React App. Run the following commands:
yarn create react-app smart-lightingcd smart-lightingyarn start
Your React application is now running locally in development mode. You can visit it at http://localhost:3000 in your browser.
2. Adding Picovoice Scripts to the Project
We will be using the files described in this article. This includes the WebAssembly modules, its wrapper and a binding file for both porcupine and rhino. For details on the purpose of each file, see that article.
2.1 Bindings
In the public
folder, create a new folder called scripts
. Here, we will place all our core binding and .wasm
files that allow us to use the WASM module. The files pv_porcupine.wasm and pv_porcupine.js can be found in the lib/wasm
folder of the porcupine repo, while the files pv_rhino.wasm and pv_rhino.js can be found in the lib/wasm
folder) of the rhino repo. We then need to get the binding files porcupine.js and rhino.js from the binding/javascript
folders of the porcupine and rhino repos, respectively.
2.2 Web Voice Processor and Downsampling Worker
We also need to add the web voice processor and downsampling worker to the project. Install the @picovoice/web-voice-processor
library with yarn.
yarn add @picovoice/web-voice-processor
We will be importing web_voice_processor.js
in smart-lighting/src/App.js
import "@picovoice/web-voice-processor/src/web_voice_processor"
We then have to move downsampling_worker.js
from @picovoice/web-voice-processor
in node_modules
to the public/scripts
folder.
2.3 Demo Code (Workers and Managers)
The JavaScript demo code in the rhino repo provides us with web workers for porcupine and rhino, as well as an overarching manager that connects them to the web voice processor. We will copy the porcupine worker script as well as the rhino worker script from the existing demos in the repos into our public/scripts folder. Finally, we have to add a copy of picovoice_manager.js from the shared folder.
We use the public
folder as a way to add assets outside of the module system. These scripts can be referenced with the environment variable PUBLIC_URL
.
The porcupine rhino manager needs to be added as a global variable so we can access it in our application code. Hence, we add the following line in the <head>
of smart-lighting/public/index.html
<script src="%PUBLIC_URL%/scripts/picovoice_manager.js"></script>
This is how your directory structure should look. You may have some additional files generated by Create React App but they are not relevant to this tutorial:
smart-lighting├── public│ ├── scripts│ │ ├── downsampling_worker.js│ │ ├── picovoice_manager.js│ │ ├── porcupine_worker.js│ │ ├── porcupine.js│ │ ├── pv_porcupine.js│ │ ├── pv_porcupine.wasm│ │ ├── pv_rhino.js│ │ ├── pv_rhino.wasm│ │ ├── rhino_worker.js│ │ └── rhino.js│ ├── index.html│ ├── manifest.json│ └ ...│├── src│ ├── App.js│ ├── App.css│ └ ......
Please check out our article on Running Offline Voice AI Inside the Web Browser for details of these files.
3. Creating the SmartLightingDemo Class
In your src
folder, create a new folder called picovoice
for all Picovoice related modules. In picovoice
, we will store a class called SmartLightingDemo that will be accessing the scripts in the public
folder through PUBLIC_URL
. The SmartLightingDemo encapsulates the VUI and all of its implementation details. We will use this class as the interface for our React application to interact with the VUI.
The class stores the porcupine rhino manager, keyword IDs, sensitivities and context as attributes. The start
method has parameters for initCallback
, ppnCallback
and rhnCallback
functions. The refresh
method keep the callback closures up-to-date. The stop
method releases the workers and halts the web voice processor.
4. Adding Wake Words and Context to SmartLightingDemo
With the Picovoice files and scripts in place, we can now supply our specific wake words and context so that the application listens for “Picovoice” and can perform NLU on the domain of smart lighting. These are captured as model files for the smart lighting wake words and context are available in the demo as “.ppn
” and “.rhn
” files, respectively. They are provided to the SmartLightingDemo
class as arguments.
We will encode these binary files into base64 strings, as it is a convenient format for embedding into JavaScript applications. In the example code, the encoding is done using Node, as part of the build process. We can also use the command line:
base64 smart_lighting.rhnbase64 picovoice.ppn
Store the base64 strings as constants in a separate file and import them into smart-lighting/src/picovoice/smart_lighting_demo.js
. After which, add them into this.keywordIDs
and this.context
.
import { LIGHTING_CONTEXT } from "./lighting_context.js";import { PICOVOICE_64 } from "./picovoice_64.js";class SmartLightingDemo {constructor() {this.picovoiceMgr = window.PicovoiceManagerthis.keywordIDs = {picovoice: Buffer.from(PICOVOICE_64 , "base64"),}this.sensitivities = new Float32Array([0.6])this.context = Buffer.from(LIGHTING_CONTEXT, "base64")}
We also explicitly access the PicovoiceManager
via ‘window
’, since the picovoice_manager.js
that we added as a script in the HTML file defines PicovoiceManager
as a global variable. This is the quickest way to adapt the existing scripts to work with our CRA code base. Alternatively, you could create module/ES6 versions of the PicovoiceManager
and others which could then be imported.
5. Instantiating SmartLightingDemo in App.js
SmartLightingDemo
represents our Picovoice-powered VUI. Let’s connect that to our React application. Fundamentally this works by passing callbacks to the VUI that are invoked when certain voice events are detected.
Import the SmartLightingDemo
class and create an instance in smart-lighting/src/App.js
:
import SmartLightingDemo from "./picovoice/smart_lighting_demo"const demo = new SmartLightingDemo()
SmartLightingDemo
has start and stop functions. We can connect these to a React button in the GUI to turn the program on and off.
Create startListening()
and stopListening()
functions that make use of the demo's start and stop function, as well as a higher level toggleListening()
function. This will capture some additional application logic to e.g. reset state when the application is turned off.
const toggleListening = () => {if (listening) {stopListening()} else {startListening()}}const stopListening = () => {demo.stop()setListening(false)setWakePhrase(false)resetDemo()}const startListening = () => {demo.start(initEvent, keywordEvent, inferenceEvent)setListening(true)setWakePhrase(false)setIntentFailed(false)if (!demoInitialized) {setDemoInitialized(true)setDemoLoading(true)}}
Callbacks
Within the start function defined in the SmartLightingDemo
class in smart_lighting_demo.js
, we call the start function in PicovoiceManager
and feed in three callbacks: the initialization callback, the porcupine callback and the rhino callback. These callbacks are fired when the relevant event messages are received from the porcupine and rhino workers.
initCallback: Engines are initialized and ready to process audio
Because we have to ask for permissions to use the microphone from the user, it is not possible to complete initialization on load. Additionally, the WASM module must load and finish initialization before it is ready to process speech data. Because these events are asynchronous with the page load, we need a mechanism to know if our application is loaded and ready for voice input, so that users do not attempt to speak into the void.
We want to give user feedback regarding the state of the demo, and notify them when it is ready to use. Create two states to keep track of whether the demo is loading, and whether it is initialized in smart-lighting/src/App.js
.
const [demoInitialized, setDemoInitialized] = useState(false)const [demoLoading, setDemoLoading] = useState(false)
Create an event handler called initCallback. This will be fired when the demo is initialized.
const initCallback = event => {setDemoInitialized(true)setDemoLoading(false)}
ppnCallback: Wake Word Detection
We have created a handler for the initialization event. When the application is running, we need to be able to know when the user says the wake word, which in this application is "Picovoice". Create two states: one to keep track of whether the wake phrase has been detected, and another to determine whether the engine failed to detect an intent (when it’s successful, that will be handled by other logic).
const [wakePhrase, setWakePhrase] = useState(false)
Create an event handler called ppnCallback
. This will be invoked when the wake phrase “Picovoice” is detected.
const ppnCallback = event => {setWakePhrase(true)setIntentFailed(false)}
rhnCallback: Inference
After detecting a wake phrase, the user will say a command, like:
“turn on the living room lights”.
The manager will direct the audio processing input to the speech- to-intent engine. To capture the output of the engine and use it to invoke changes in our UI, we create rhnCallback
.
Sometimes, users will say a command that is not captured by the Speech-to-Intent context. For example, if you ask it to tell you a joke, which the smart lighting context is not designed to recognize. In this case, the engine will report that it did not understand. We’ll record this with our intentFailed state.
Create an event handler called inferenceEvent, which will be fired whenever the rhino engine returns the intent detected. This function is more complex than the others, as the context allows many possible intent permutations.
const rhnCallback = (information) => { ... }
information has the following structure:
{"isUnderstood": true,"intent": "changeIntensity","slots": {"location": "living room","intensity": "up"}}
The keys “isUnderstood”, “slots”, and “intent” will always be present in the output. The structure of the slots and the names of the intents will depend on the context (slots capture specific variables from speech, in this case e.g. which room to change the lights in).
This particular output is specific to the smart lighting demo. See this file to understand the complete structure of the lighting context. The context is also available as a starting template “Smart Lighting” for Picovoice Console, allowing you to make changes and as test them in the Console as you iterate.
setWakePhrase(false);if (information.isUnderstood === false) {setIntentFailed(true);} else {setIntentFailed(false);
Once we detect an intent, we reset wakePhrase to false, as the manager has switched audio input back to the wake word engine and is ready to detect new events. Now we can use the results of the speech-to-intent engine and make the application logic reflect the user’s intent.
const intent = information.intentconst slots = information.slotsconst location = slots["location"] === undefined ? "all" : slots["location"]if (intent === "changeLightState") {const state = slots["state"] === "on" ? true : falseSTATE_MAP.get(location)(state)} else if (intent === "changeLightStateOff") {STATE_MAP.get(location)(false)} else if (intent === "changeColor") {const color = LIGHT_MAP.get(slots["color"])COLOR_MAP.get(location)(color)if (location !== "all") {STATE_MAP.get(location)(true)}} else if (intent === "changeIntensity") {const dir = DIRECTION_MAP.get(slots["intensity"])changeIntensity(location, dir)} else if (intent === "reset") {const resetAspect = RESET_MAP.get(slots["feature"])if (location === null || location === undefined) {if (resetAspect === ASPECT_COLOR) {setAllLightColor(LIGHT_YELLOW)} else if (resetAspect === ASPECT_INTENSITY) {setAllLightIntensity(1)}} else {if (resetAspect === ASPECT_COLOR) {COLOR_MAP.get(location)(LIGHT_YELLOW)} else if (resetAspect === ASPECT_INTENSITY) {changeIntensity(location, "reset")}}}
Here is where the magic happens: intent handling. We have five intents that need to be handled differently: changeColor
, changeIntensity
, changeLightState
, changeLightStateOff
, reset
. Within each intent, different functions have to be invoked, depending on the particular slots.
We store the different functions in a map, and get them based on the location in the slots and then invoke them here. The maps store functions that alter the state of the App component. There are many moving parts and many many states to keep track of which are beyond the scope of this tutorial. See the source to understand how it works in more detail.
Using the Effect Hook With The Event Handlers
Some intents may need access to the application state. For example, for the command:
"brighten the light in the living room"
We first have to check if the current intensity of the living room is at maximum. The inferenceEvent function will capture the initial values of the state, but not the new values that come from subsequent updates. The manager will be calling a function with stale state and the logic will break.
We need to refresh the callbacks whenever there is a state change using the React Effect hook. The Effect Hook lets you perform side effects in function components. By default, it runs on render and on every update. We create an effect hook with useEffect, and it takes in two arguments: the function to be run, and a dependency array.
useEffect(() => {demo.refresh(initEvent, keywordEvent, inferenceEvent)})
We do not pass a second argument, so that the effect hook runs on every update.
Adding Some Style
First, we create CSS definitions for the two different states:
.house .lights {fill: none;stroke: none;}#living-room.on {fill: url(#livingRoomGradient);}
And then make the className of the corresponding component be dependent on a state.
<rectx="0"y="0"width="224"height="224"className={`lights${livingRoomLightState ? " on" : ""}`}id="living-room"/>
What's Next?
Try swapping "Picovoice" for an alternate wake word from the freely-available models on GitHub
Use Picovoice Console to generate a custom wake word.
Use Picovoice Console to modify the context. Create a new context and use the “Smart Lighting” template.