Leopard Speech-to-Text
React Quick Start
Platforms
- Chrome & Chromium-based browsers
- Edge
- Firefox
- Safari
Requirements
- Picovoice Account and AccessKey
- Node.js 16+
- React 17.0+
- npm
Picovoice Account & AccessKey
Signup or Login to Picovoice Console to get your AccessKey
.
Make sure to keep your AccessKey
secret.
Quick Start
Setup
Install Node.js.
Install the npm packages:
Usage
Download a custom model from Picovoice Console or use a default language model. Place the model file in the project's public directory or generate a base64 representation of the file using the built-in script:
Create a leopardModel
object with either of the methods above:
Import and call the useLeopard
Hook, and initialize Leopard Speech-to-Text
with your AccessKey
and leopardModel
:
To process audio, you can either upload it as a File
object or record it directly. Once the audio has been processed, the transcript will be available in the result
state variable.
File Object
Transcribe File objects directly using the processFile
function:
Record Audio
Leopard Speech-to-Text React binding uses WebVoiceProcessor to record audio with a microphone. To start recording audio, call startRecording
:
Call stopRecording
to stop recording audio and begin processing:
Once processing is complete, the transcript will be available via the result
state variable.
Allocated resources are automatically freed on unmount, but can also be done explicitly:
Model File
Create custom models using the Picovoice Console.
Train and download a Leopard Speech-to-Text model (.pv
) for the target platform Web (WASM)
.
This model file can be used directly with publicPath
, but, if base64
is preferable, convert the .pv
file to a base64
JavaScript variable using the built-in pvbase64
script:
Model files (.pv
) are saved in IndexedDB to be used by Web Assembly.
Either base64
or publicPath
must be set to instantiate Leopard.
If both are set, Leopard Speech-to-Text will use the base64
model.
Non-English Languages
In order to use Leopard with other languages, you need to use the corresponding model file (.pv
) for the desired language. The model files for all
supported languages are available on the Leopard GitHub repository.
Word Metadata
Along with the transcript, Leopard Speech-to-Text returns metadata for each transcribed word. Available metadata items are:
- Start Time: Indicates when the word started in the transcribed audio. Value is in seconds.
- End Time: Indicates when the word ended in the transcribed audio. Value is in seconds.
- Confidence: Leopard Speech-to-Text's confidence that the transcribed word is accurate. It is a number within
[0, 1]
. - Speaker Tag: If speaker diarization is enabled on initialization, the speaker tag is a non-negative integer identifying unique speakers, with
0
reserved for unknown speakers. If speaker diarization is not enabled, the value will always be-1
.
Demo
For the Leopard Speech-to-Text React SDK, there is a React demo project available on the Leopard Speech-to-Text GitHub repository.
Setup
Clone the Leopard Speech-to-Text repository from GitHub:
Usage
- Install dependencies:
- Run the demo with the
start
script with a language code to start a local web server hosting the demo in the language of your choice (e.g.de
-> German,ko
-> Korean). To see a list of available languages, run start without a language code.
Open http://localhost:3000 to view it in the browser.
Enter your access key and press on
Init Leopard
. Once Leopard Speech-to-Text has loaded, upload an audio file or record audio with a microphone to begin transcribing speech-to-text.