Speech Recognition in JavaScript Tutorial

🚀 Best-in-class Voice AI!

Build compliant and low-latency AI apps running within web browsers without sending user data to 3rd party servers.

Speech Recognition is a broad term that is often associated solely with Speech-to-Text technology. However, Speech Recognition can also include technologies such as Wake Word Detection, Voice Command Recognition, and Voice Activity Detection (VAD).

This article provides a thorough guide on integrating on-device Speech Recognition into JavaScript Web apps. We will be learning about the following technologies:

In addition to plain JavaScript, Picovoice's Speech Recognition engines are also available in different UI frameworks such as React, Angular, and Vue.

Cobra VAD

Cobra Voice Activity Detection is a VAD engine that can be used to detect the presence of human speech within an audio signal.

Install the Web Voice Processor and Cobra Voice Activity Detection Web SDK packages using npm:

npm install @picovoice/web-voice-processor @picovoice/cobra-web

Sign up for a free Picovoice Console account and copy your AccessKey from the main dashboard. The AccessKey is only required for authentication and authorization.
Create an instance of CobraWorker:

import { CobraWorker } from "@picovoice/cobra-web";

function voiceProbabilityCallback(voiceProbability: number) {
    // take action based on voice probability
}

const cobra = await CobraWorker.create(
    "${ACCESS_KEY}",
    voiceProbabilityCallback
);

Subscribe CobraWorker to WebVoiceProcessor to start processing audio frames:

import { WebVoiceProcessor } from "@picovoice/web-voice-processor"

await WebVoiceProcessor.subscribe(cobra)

For further details, visit the Cobra Voice Activity Detection product page or refer to the Cobra Web SDK quick start guide.

Porcupine Wake Word

Porcupine Wake Word is a wake word detection engine that can be used to listen for user-specified keywords and activate dormant applications when a keyword is detected.

Install the Web Voice Processor and Porcupine Wake Word Web SDK packages using npm:

npm install @picovoice/web-voice-processor @picovoice/porcupine-web

Sign up for a free Picovoice Console account and copy your AccessKey from the main dashboard. The AccessKey is only required for authentication and authorization.
Create and download a custom Wake Word model using Picovoice Console.
Add the Porcupine model (.pv) for your language of choice and your custom Wake Word model (.ppn) created in the previous step to the project's public directory:

cp ${PORCUPINE_FILE_PATH} ${PUBLIC_DIRECTORY}/${PORCUPINE_FILE}
cp ${KEYWORD_FILE_PATH} ${PUBLIC_DIRECTORY}/${KEYWORD_FILE}

Create objects containing the Porcupine model and Wake Word model options:

const porcupineModel = {
  publicPath: '${PUBLIC_DIRECTORY}/${PORCUPINE_FILE}'
}

const keywordModel = {
  publicPath: '${PUBLIC_DIRECTORY}/${KEYWORD_FILE}',
  label: "${KEYWORD_LABEL}" // An arbitrary string used to identify the keyword once the detection occurs.
}

Create an instance of PorcupineWorker:

import { PorcupineWorker } from "@picovoice/porcupine-web";

function keywordDetectionCallback(detection) {
  console.log(`Porcupine detected ${detection.label}`);
}

const porcupine = await PorcupineWorker.create(
  "${ACCESS_KEY}",
  [keywordModel],
  keywordDetectionCallback,
  porcupineModel
);

Subscribe PorcupineWorker to WebVoiceProcessor to start processing audio frames:

import { WebVoiceProcessor } from "@picovoice/web-voice-processor"

await WebVoiceProcessor.subscribe(porcupine)

For further details, visit the Porcupine Wake Word product page or refer to the Porcupine Web SDK quick start guide.

Rhino Speech-to-Intent

Rhino Speech-to-Intent is a voice command recognition engine that infers user intents from utterances, allowing users to interact with applications via voice.

Install the Web Voice Processor and Rhino Speech-to-Intent Web SDK packages using npm:

npm install @picovoice/web-voice-processor @picovoice/rhino-web

Sign up for a free Picovoice Console account and copy your AccessKey from the main dashboard. The AccessKey is only required for authentication and authorization.
Create your Context using Picovoice Console.
Add the Rhino Speech-to-Intent model (.pv) for your language of choice and the Context model (.rhn) created in the previous step to the project's public directory:

cp ${RHINO_FILE_PATH} ${PUBLIC_DIRECTORY}/${RHINO_FILE}
cp ${CONTEXT_FILE_PATH} ${PUBLIC_DIRECTORY}/${CONTEXT_FILE}

Create an object containing the Rhino Speech-to-Intent model and Context model options:

const rhinoModel = {
  publicPath: '${PUBLIC_DIRECTORY}/${RHINO_FILE}'
}

const contextModel = {
  publicPath: '${PUBLIC_DIRECTORY}/${CONTEXT_FILE}'
}

Create an instance of RhinoWorker:

import { RhinoWorker } from "@picovoice/rhino-web";

function rhinoInferenceCallback(inference) {
   if (inference.isFinalized) {
      if (inference.isUnderstood) {
         const intent = inference.intent;
         const slots = inference.slots;
         // take action based on inferred intent and slot values
      } else {
         // handle unsupported commands
      }
   }
}

const rhino = await RhinoWorker.create(
    "${ACCESS_KEY}",
    contextModel,
    rhinoInferenceCallback,
    rhinoModel
);

Subscribe RhinoWorker to WebVoiceProcessor to start processing audio frames:

import { WebVoiceProcessor } from "@picovoice/web-voice-processor"

WebVoiceProcessor.subscribe(rhino);

For further details, visit the Rhino Speech-to-Intent product page or refer to the Rhino's Web SDK quick start guide.

Cheetah Streaming Speech-to-Text

Cheetah Streaming Speech-to-Text is a speech-to-text engine that transcribes voice data in real time, synchronously with audio generation.

Install the Web Voice Processor and Cheetah Streaming Speech-to-Text Web SDK packages using npm:

npm install @picovoice/web-voice-processor @picovoice/cheetah-web

Sign up for a free Picovoice Console account and copy your AccessKey from the main dashboard. The AccessKey is only required for authentication and authorization.
Generate a custom Cheetah Streaming Speech-to-Text model from the Picovoice Console (.pv) or download the default model (.pv).
Add the model to the project's public directory:

cp ${CHEETAH_FILE_PATH} ${PUBLIC_DIRECTORY}/${CHEETAH_FILE}

Create an object containing the model options:

const cheetahModel = {
  publicPath: '${PUBLIC_DIRECTORY}/${CHEETAH_FILE}'
}

Create an instance of CheetahWorker:

let transcript = "";
function transcriptCallback(cheetahTranscript: CheetahTranscript) {
  transcript += cheetahTranscript.transcript;
  if (cheetahTranscript.isEndpoint) {
    transcript += "\n";
  }
}

const cheetah = await CheetahWorker.create(
  "${ACCESS_KEY}",
  transcriptCallback,
  cheetahModel
);

Subscribe CheetahWorker to WebVoiceProcessor to start processing audio frames:

import { WebVoiceProcessor } from "@picovoice/web-voice-processor"

WebVoiceProcessor.subscribe(cheetah)

For further details, visit the Cheetah Streaming Speech-to-Text product page or refer to the Cheetah Web SDK quick start guide.

Leopard Speech-to-Text

In contrast to Cheetah Streaming Speech-to-Text, Leopard Speech-to-Text waits for the complete spoken phrase to complete before providing a transcription, enabling higher accuracy and runtime efficiency.

Install the Leopard Speech-to-Text Web SDK package using npm:

npm install @picovoice/leopard-web

Sign up for a free Picovoice Console account and copy your AccessKey from the main dashboard. The AccessKey is only required for authentication and authorization.
Generate a custom Leopard Speech-to-Text model (.pv) from Picovoice Console or download a default model (.pv) for the language of your choice.
Add the model to the project's public directory:

cp ${LEOPARD_FILE_PATH} ${PUBLIC_DIRECTORY}/${LEOPARD_FILE}

Create an object containing the model options:

const leopardModel = {
  publicPath: '${PUBLIC_DIRECTORY}/${LEOPARD_FILE}',
}

Create an instance of LeopardWorker:

const leopard = await LeopardWorker.create(
  "${ACCESS_KEY}",
  leopardModel
);

Transcribe audio (sample rate of 16 kHz, 16-bit linearly encoded and 1 channel):

function getAudioData(): Int16Array {
  // get audio
  return new Int16Array();
}

const { transcript, words } = await leopard.process(getAudioData());

For further details, visit the Leopard Speech-to-Text product page or refer to Leopard's Web SDK quick start guide.

JavaScript Speech Recognition

Cobra VAD

Porcupine Wake Word

Rhino Speech-to-Intent

Cheetah Streaming Speech-to-Text

Leopard Speech-to-Text

More from Picovoice