Learn how to automatically transcribe speech to text using Picovoice Leopard Speech-to-Text Web SDK. The SDK runs on all modern browsers. If you are looking for a speech-to-text engine in Node.js, you might want to check the Speech-to-Text using Node.js blog post.
Why Leopard Speech-to-Text?
The SpeechRecognition
interface of Web Speech API
is freely available. But it has shortcomings. SpeechRecognition is
not yet supported across all browsers and has (undocumented) usage limitations. Also, existing implementations of
SpeechRecognition, rely on server-side voice recognition and are not private.
Leopard is an on-device speech-to-text engine. All voice processing is happening on the device. i.e., the browser. How?
Years of applied research in making deep learning models tiny (i.e. TinyML
) and extensive use of SIMD
instructions
in WebAssembly
.
Setup & Installation
Create a project and install the SDK:
Sign up for Picovoice Console
Log in to (sign up for) Picovoice Console. It is free, and no credit card is required!
Copy your AccessKey
to the clipboard.
Serving the Speech-to-Text Model
Leopard Speech-to-Text is on-device, meaning that voice processing happens within the browser. Hence, we need to transfer the model (deep neural network) to the client. There are two options:
Serve the model on the Public Directory
of a website, and pass the URL to SDK. This method reduces the page size
significantly but requires some upfront work. Alternatively, ship the model with the page content to the end user.
Since the model is binary, we need to transform it into a text form using Base64 Encoding
. This method is as
straightforward as it gets. There is even a utility in the Leopard Speech-to-Text package to convert the model into base64 format:
Implement Speech Recognition in JavaScript
Create an instance of Leopard Speech-to-Text:
Replace accessKey
with your AccessKey
from Picovoice Console. leopardModel
is an object containing information about
the whereabouts of the model. If you are using the public directory method, use this:
If you are using the base64 method, use this:
Transcribe audio:
Implement getAudioData
based on your application. It can read from a microphone via Web Audio API
or possibly a file.
Explore
The Leopard Speech-to-Text Web SDK is open-source and available on GitHub. Additionally, an open-source speech recognition web demo based on Leopard Speech-to-Text is available.
Start Building