Speech Recognition in React Tutorial

🚀 Best-in-class Voice AI!

Build compliant and low-latency AI applications using React without sending user data to 3rd party servers.

In this article, we will learn how to perform Wake Word Detection and Voice Command Detection in React.

Porcupine Wake Word is used to recognize specific phrases or words and Rhino Speech-to-Intent is used to understand voice commands and extract intent with details.

Porcupine Wake Word

Install the Porcupine Wake Word React SDK using npm:

npm install @picovoice/porcupine-react

Sign up for a free Picovoice Console account and copy your AccessKey. It handles authentication and authorization.
Create your custom wake word model using Picovoice Console.
Add the Porcupine model and the Wake Word model to the project by:

Either copying the model file to the project's public directory:

cp ${PORCUPINE_FILE_PATH} ${PUBLIC_DIRECTORY}/${PORCUPINE_FILE}
cp ${KEYWORD_FILE_PATH} ${PUBLIC_DIRECTORY}/${KEYWORD_FILE}

(or)

Create a base64 string of the model using the pvbase64 script included in the package:

npx pvbase64 -i ${PORCUPINE_FILE_PATH} -o ${OUTPUT_DIRECTORY}/${MODEL_NAME}.js
npx pvbase64 -i ${KEYWORD_FILE_PATH} -o ${OUTPUT_DIRECTORY}/${KEYWORD_NAME}.js

Create an object containing the Porcupine model and Wake Word engine model options:

import base64porcupine from '${OUTPUT_DIRECTORY}/${MODEL_NAME}.js'
import base64keyword from '${OUTPUT_DIRECTORY}/${KEYWORD_NAME}.js'

const porcupineModel = {
  publicPath: '${PUBLIC_DIRECTORY}/${PORCUPINE_FILE}',
  // or
  base64: base64model,
}
const keywordModel = {
  publicPath: '${PUBLIC_DIRECTORY}/${KEYWORD_FILE}',
  // or
  base64: base64keyword,
  label: "${KEYWORD_LABEL}" // An arbitrary string used to identify the keyword once the detection occurs.
}

Create an instance of the Wake Word engine using the model options in the previous step:

import { usePorcupine } from '@picovoice/porcupine-react';

const {
  keywordDetection,
  isListening,
  init,
  start,
} = usePorcupine();

await init(
  ${ACCESS_KEY},
  keywordModel,
  porcupineModel
);

Process audio frames by calling the start method:

await start();

Once started, isListening state will be set to true and keywordDetection state will be updated based on Wake Word detections:

useEffect(() => {
  if (keywordDetection !== null) {
    console.log(keywordDetection.label);
  }
}, [keywordDetection])

For more information check Porcupine Wake Words's product page or refer to Porcupine's React SDK quick start guide.

Rhino Speech-to-Intent

Install the Rhino Speech-to-Intent React SDK using npm:

npm install @picovoice/rhino-react

Sign up for a free Picovoice Console account and copy your AccessKey. It handles authentication and authorization.
Create your Context using Picovoice Console.
Add the Rhino model and the Context model to the project by:

Either copying the model file to the project's public directory:

cp ${RHINO_FILE_PATH} ${PUBLIC_DIRECTORY}/${RHINO_FILE}
cp ${CONTEXT_FILE_PATH} ${PUBLIC_DIRECTORY}/${CONTEXT_FILE}

(or)

Create a base64 string of the model using the pvbase64 script included in the package:

npx pvbase64 -i ${RHINO_FILE_PATH} -o ${OUTPUT_DIRECTORY}/${MODEL_NAME}.js
npx pvbase64 -i ${CONTEXT_FILE_PATH} -o ${OUTPUT_DIRECTORY}/${CONTEXT_NAME}.js

Create an object containing the Rhino model and Context model options:

import base64rhino from '${OUTPUT_DIRECTORY}/${MODEL_NAME}.js'
import base64context from '${OUTPUT_DIRECTORY}/${CONTEXT_NAME}.js'

const rhinoModel = {
  publicPath: '${PUBLIC_DIRECTORY}/${RHINO_FILE}',
  // or
  base64: base64model,
}
const contextModel = {
  publicPath: '${PUBLIC_DIRECTORY}/${CONTEXT_FILE}',
  // or
  base64: base64context,
}

Create an instance of the Speech-to-Intent engine using the model options in the previous step:

import { useRhino } from '@picovoice/rhino-react';

const {
  inference,
  isListening,
  init,
  process,
} = useRhino();

await init(
  "${ACCESS_KEY}",
  contextModel,
  rhinoModel
);

Process audio frames by calling the process method:

await process();

Rhino will listen and process frames of microphone audio until it has finalized an inference, which it will return via the inference variable. Rhino will enter a paused state once a conclusion is reached. From the paused state, call process again to start another inference.

useEffect(() => {
  if (inference !== null) {
    console.log(inference);
  }
}, [inference])

For more information check Rhino Speech-to-Intent's product page or refer to Rhino's React SDK quick start guide.

React Speech Recognition

Porcupine Wake Word

Rhino Speech-to-Intent

More from Picovoice