Speech-to-Text using JavaScript

🚀 Best-in-class Voice AI!

Build compliant and low-latency AI apps running within web browsers without sending user data to 3rd party servers.

Learn how to automatically transcribe speech to text using Picovoice Leopard Speech-to-Text Web SDK. The SDK runs on all modern browsers. If you are looking for a speech-to-text engine in Node.js, you might want to check the Speech-to-Text using Node.js blog post.

Why Leopard Speech-to-Text?

The SpeechRecognition interface of Web Speech API is freely available. But it has shortcomings. SpeechRecognition is not yet supported across all browsers and has (undocumented) usage limitations. Also, existing implementations of SpeechRecognition, rely on server-side voice recognition and are not private.

Leopard is an on-device speech-to-text engine. All voice processing is happening on the device. i.e., the browser. How? Years of applied research in making deep learning models tiny (i.e. TinyML) and extensive use of SIMD instructions in WebAssembly.

Setup & Installation

Create a project and install the SDK:

npm install @picovoice/leopard-web

Log in to (sign up for) Picovoice Console. It is free, and no credit card is required! Copy your AccessKey to the clipboard.

Serving the Speech-to-Text Model

Leopard Speech-to-Text is on-device, meaning that voice processing happens within the browser. Hence, we need to transfer the model (deep neural network) to the client. There are two options:

Serve the model on the Public Directory of a website, and pass the URL to SDK. This method reduces the page size significantly but requires some upfront work. Alternatively, ship the model with the page content to the end user. Since the model is binary, we need to transform it into a text form using Base64 Encoding. This method is as straightforward as it gets. There is even a utility in the Leopard Speech-to-Text package to convert the model into base64 format:

npx pvbase64 -i ${MODEL_PATH} -o ${BASE64_PATH}

Implement Speech Recognition in JavaScript

Create an instance of Leopard Speech-to-Text:

const handle = await Leopard.create(
  accessKey,
  leopardModel
);

Replace accessKey with your AccessKey from Picovoice Console. leopardModel is an object containing information about the whereabouts of the model. If you are using the public directory method, use this:

const leopardModel = {
  publicPath: publicRelativePath,
}

If you are using the base64 method, use this:

const leopardModel = {
  base64: base64String,
}

Transcribe audio:

function getAudioData(): Int16Array {
  ... // function to get audio data
}

const result = await handle.process(getAudioData());
console.log(result.transcript);
console.log(result.words);

Implement getAudioData based on your application. It can read from a microphone via Web Audio API or possibly a file.

Explore

The Leopard Speech-to-Text Web SDK is open-source and available on GitHub. Additionally, an open-source speech recognition web demo based on Leopard Speech-to-Text is available.

Start Building