Leopard Speech-to-Text
Web Quick Start
Platforms
- Chrome & Chromium-based browsers
- Edge
- Firefox
- Safari
Requirements
- Picovoice Account and AccessKey
- Node.js 16+
- npm
Picovoice Account & AccessKey
Signup or Login to Picovoice Console to get your AccessKey
.
Make sure to keep your AccessKey
secret.
Quick Start
Setup
Install Node.js.
Install the Leopard Speech-to-Text Web package:
Usage
Generate a custom Leopard Speech-to-Text
model from Picovoice Console or download a default model.
Put the model file in the project's public directory or generate a base64 model using the built-in script:
Create a LeopardWorker
instance using a base64 model or a model hosted in a public directory:
Transcribe audio (sample rate of 16 kHz, 16-bit linearly encoded and 1 channel):
Release resources explicitly when done with Leopard:
Non-English Languages
In order to use Leopard with other languages, you need to use the corresponding model file (.pv
) for the desired language. The model files for all
supported languages are available on the Leopard GitHub repository.
Word Metadata
Along with the transcript, Leopard Speech-to-Text returns metadata for each transcribed word. Available metadata items are:
- Start Time: Indicates when the word started in the transcribed audio. Value is in seconds.
- End Time: Indicates when the word ended in the transcribed audio. Value is in seconds.
- Confidence: Leopard Speech-to-Text's confidence that the transcribed word is accurate. It is a number within
[0, 1]
. - Speaker Tag: If speaker diarization is enabled on initialization, the speaker tag is a non-negative integer identifying unique speakers, with
0
reserved for unknown speakers. If speaker diarization is not enabled, the value will always be-1
.
Demo
For the Leopard Speech-to-Text Web SDK, there is a Web demo project available on the Leopard Speech-to-Text GitHub repository.
Setup
Clone the Leopard Speech-to-Text repository from GitHub:
Usage
- Install dependencies and run:
- Open http://localhost:5000 to view it in the browser.