Learn how to automatically transcribe speech to text using Picovoice Leopard Speech-to-Text Web SDK. The SDK runs on all modern browsers. If you are looking for a speech-to-text engine in Node.js, you might want to check the Speech-to-Text using Node.js blog post.
SpeechRecognition interface of
Web Speech API is freely available. But it has shortcomings. SpeechRecognition is
not yet supported across all browsers and has (undocumented) usage limitations. Also, existing implementations of
SpeechRecognition, rely on server-side voice recognition and are not private.
Leopard is an on-device speech-to-text engine. All voice processing is happening on the device. i.e., the browser. How?
Years of applied research in making deep learning models tiny (i.e.
TinyML) and extensive use of
Setup & Installation
Create a project and install the SDK:
Sign up for Picovoice Console
Log in to (sign up for) Picovoice Console. It is free, and no credit card is required!
AccessKey to the clipboard.
Serving the Model
Leopard is on-device, meaning that voice processing happens within the browser. Hence we need to transfer the model (deep neural network) to the client. There are two options:
Serve the model on the
Public Directory of a website, and pass the URL to SDK. This method reduces the page size
significantly but requires some upfront work. Alternatively, ship the model with the page content to the end user.
Since the model is binary, we need to transform it into a text form using
Base64 Encoding. This method is as
straightforward as it gets. There is even a utility in the Leopard package to convert the model into base64 format:
Create an instance of Leopard:
accessKey with your
AccessKey from Picovoice Console.
leopardModel is an object containing information about
the whereabouts of the model. If you are using the public directory method, use this:
If you are using the base64 method, use this:
getAudioData based on your application. It can read from a microphone via
Web Audio API or possibly a file.