wasmRhino - Web API

  • npm
  • NLU
  • WebAssembly
  • Browser

This document outlines how to integrate the Rhino wake word engine within an application using its Web API.

Requirements

  • yarn (or npm)
  • Secure browser context (i.e. HTTPS or localhost)

Compatibility

  • Chrome, Edge
  • Firefox
  • Safari

The Picovoice SDKs for Web are powered by WebAssembly (WASM), the Web Audio API, and Web Workers. All audio processing is performed in-browser, providing intrinsic privacy and reliability.

All modern browsers are supported, including on mobile. Internet Explorer is not supported.

Using the Web Audio API requires a secure context (HTTPS connection), with the exception of localhost, for local development.

JavaScript Frameworks

Looking to use Rhino with React, Angular, or Vue? There are framework-specific packages available:

The framework-specific packages operate at a higher level of abstraction and are meant to integrate as quickly and easily as possible, following each framework's conventions and best practices. Furthermore, the demo applications show the complete lifecycle of using Rhino in a component, including setup and teardown. Interacting with Web Workers is hidden behind a facade, and the Web Voice Processor is also coordinated behind the scenes to automatically setup the microphone.

Otherwise, this doc will provide instruction on how to use Rhino with "Vanilla" JavaScript and HTML, connecting it to the Web Voice Processor for microphone use.

Introduction

Rhino for Web is available in two flavors: Worker and Factory:

  • The Worker packages are all-in-one Web Workers which wrap Rhino instances that will work with the web-voice-processor (and the Angular, React, and Vue packages).
  • The Factory packages give you access to instances directly. This is useful if you wish to build your own worker/worklet, or perhaps use Rhino in some other custom scenario.

Structure

The Rhino SDK for Web is provided in several npm packages, due to the logistics and size of shipping ~3-4MB voice models.

Workers

For typical cases, use the worker packages. Worker packages create complete self-contained Rhino Web Worker instances that can be immediately used with @picovoice/web-voice-processor and with the Angular, React, and Vue packages.

Factories

Factory packages allow you to create instances of Rhino directly. Useful for building your own custom Worker/Worklet, or some other bespoke purpose.

Installation & Usage

Worker: Using modern JavaScript, ES Modules, Bundlers (e.g. Webpack)

To obtain a Rhino Worker, we can use the static create factory method from the RhinoWorkerFactory. Here is a complete example that:

  1. Obtains a Worker from the RhinoWorkerFactory (in this case, English) to listen for speech in the
  2. Handles to the inference event by setting the worker's onmessage event handler and looking for messages with a data property set to "rhn-inference".
  3. Starts up the WebVoiceProcessor to forward microphone audio to the Rhino Worker
  4. Sets up a button to trigger the push-to-talk functionality for Rhino and begin a voice interaction. This consists of sending the Rhino worker a "resume" and "pause" message.

E.g.:

yarn add @picovoice/web-voice-processor @picovoice/rhino-web-en-worker
import { WebVoiceProcessor } from "@picovoice/web-voice-processor"
import { RhinoWorkerFactory } from "@picovoice/rhino-web-en-worker";
const RHN_CONTEXT_64 = /* Base64 representation of a .rhn context */
async startRhino()
// Create a Rhino Worker (English language) to listen for
// commands in the specified context
const rhinoWorker = await RhinoWorkerFactory.create(
{context: RHN_CONTEXT_64 }
);
// The worker will send a message with data.command = "rhn-inference" upon concluding
// Here we tell it to log it to the console
rhinoWorker.onmessage = (msg) => {
switch (msg.data.command) {
case 'rhn-inference':
// Log the event
console.log("Rhino inference: " + msg.data.inference);
// Pause Rhino processing until the push-to-talk button is pressed again
rhinoWorker.postMessage({command: "pause"})
break;
default:
break;
}
};
// Start up the web voice processor. It will request microphone permission
// It downsamples the audio to voice recognition standard format (16-bit 16kHz linear PCM, single-channel)
// The incoming microphone audio frames will then be forwarded to the Rhino Worker
// n.b. This promise will reject if the user refuses permission! Make sure you handle that possibility.
const webVp = await WebVoiceProcessor.init({
engines: [rhinoWorker],
start: true,
});
}
// Rhino is push-to-talk. We need to to tell it that we
// are starting a voice interaction:
function pushToTalk() {
rhinoWorker.postMessage({command: "resume"})
}
}
startRhino()
...
// Finished with Rhino? Release the WebVoiceProcessor and the worker.
if (done) {
webVp.release()
rhinoWorker.sendMessage({command: "release"})
}

Worker: Script Tag / IIFE / CDN

Rhino's worker and factory packages are also available in IIFE format, intended for direct inclusion into HTML instead of a bundler. You can use local node modules, or use the CDN unpkg version for rapid prototyping.

These will add RhinoWebXxWorker and WebVoiceProcessor as global variables on window:

<!DOCTYPE html>
<html lang="en">
<head>
<script src="https://unpkg.com/@picovoice/rhino-web-en-worker/dist/iife/index.js"></script>
<script src="https://unpkg.com/@picovoice/web-voice-processor/dist/iife/index.js"></script>
<script type="application/javascript">
const CLOCK_CONTEXT_64 =
""
function writeMessage(message) {
console.log(message)
let p = document.createElement("p")
let text = document.createTextNode(message)
p.appendChild(text)
document.body.appendChild(p)
}
async function startRhino() {
writeMessage("Rhino is loading. Please wait...")
window.rhinoClockWorker = await RhinoWebEnWorker.RhinoWorkerFactory.create(
{
context: {
base64: CLOCK_CONTEXT_64,
sensitivity: 0.5,
},
start: false,
}
)
writeMessage("Rhino worker ready!")
window.rhinoClockWorker.onmessage = msg => {
if (msg.data.command === "rhn-inference") {
writeMessage(
"Inference detected: " + JSON.stringify(msg.data.inference)
)
window.rhinoClockWorker.postMessage({ command: "pause" })
document.getElementById("push-to-talk").disabled = false
writeMessage(
"Rhino is paused. Press the 'Push to Talk' button to speak again."
)
}
}
writeMessage(
"WebVoiceProcessor initializing. Microphone permissions requested ..."
)
try {
let webVp = await WebVoiceProcessor.WebVoiceProcessor.init({
engines: [window.rhinoClockWorker],
})
writeMessage(
"WebVoiceProcessor ready! Press the 'Push to Talk' button to talk."
)
} catch (e) {
writeMessage("WebVoiceProcessor failed to initialize: " + e)
}
}
document.addEventListener("DOMContentLoaded", function () {
startRhino()
document.getElementById("push-to-talk").onclick = function (event) {
writeMessage("Rhino is listening for your commands ...")
this.disabled = true
window.rhinoClockWorker.postMessage({ command: "resume" })
}
})
</script>
</head>
<body>
<h1>Rhino Web Demo</h1>
<p>This demo uses Rhino for Web and the WebVoiceProcessor to:</p>
<ol>
<li>
Create an English instance of Rhino that understands commands in the
"Pico Clock" context;
</li>
<li>
Acquire microphone (& ask permission) data stream and convert to voice
processing format (16kHz 16-bit linear PCM). The downsampled audio is
forwarded to the Rhino engines. The audio <i>does not</i> leave the
browser: all processing is occurring via the Rhino WebAssembly code.
</li>
<li>
Await inference events from the Rhino engine and output them to the
page. When the inference is concluded, the push-to-talk button is
enabled again.
</li>
</ol>
<hr />
<button id="push-to-talk">Push to Talk</button>
</body>
</html>

Factory

If you wish to build your own worker/worklet, or perhaps not use workers at all, use the factory packages. This will let you instantiate Rhino engine instances directly.

The audio passed to the worker must be of the correct format. The WebVoiceProcessor handles downsampling in the examples above. If you are not using that, you must ensure you do it yourself.

E.g.:

import { Rhino } from "@picovoice/rhino-web-en-worker"
async function startRhino() {
const handle = await Rhino.create(
{ context: /* Base64 representation of a .rhn context */ },
)
// Send Rhino frames of audio (check handle.frameLength for size of array)
const audioFrames = new Int16Array(/* Provide data with correct format and size */)
const rhinoInference = handle.process(audioFrames)
// rhinoInference: isFinalized = true when
// Rhino has concluded
}
startRhino()

The Rhino factory returns instances with the RhinoEngine interface:

export interface RhinoEngine {
release(): void
process(frames: Int16Array): RhinoInference
version: string
sampleRate: number
frameLength: number
}

Results

Ultimately, when Rhino produces a conclusion, it will return a RhinoInference object:

export type RhinoInference = {
/** Rhino has concluded the inference (isUnderstood is now set) */
isFinalized: boolean
/** The intent was understood (it matched an expression in the context) */
isUnderstood?: boolean
/** The name of the intent */
intent?: string
/** Map of the slot variables and values extracted from the utterance */
slots?: Record<string, string>
}

Issue with this doc? Please let us know.