Leopard Speech-to-Text
Node.js API
API Reference for the Node.js Leopard SDK (npm)
Leopard
Class for the Leopard Speech-to-Text engine.
Leopard can be initialized using the class constructor()
.
Resources should be cleaned when you are done using the release()
method.
Leopard.constructor()
Leopard
constructor.
Parameters
accessKey
string : AccessKey obtained from Picovoice Console.options
LeopardOptions: Optional configuration arguments:modelPath
string : Path to the file containing model parameters (.pv
).libraryPath
string : Path to the Leopard dynamic library (.node
).enableAutomaticPunctuation
boolean : Whether to enable automatic punctuation. Default is false.enableDiarization
boolean : Whether to enable diarization. Set totrue
to enable speaker diarization, which allows Leopard to differentiate speakers as part of the transcription process. Word metadata will include aspeaker_tag
to identify unique speakers.
Returns
Leopard
: An instance of Leopard platform.
Leopard.release()
Releases resources acquired by Leopard
.
Leopard.sampleRate
Getter for audio sample rate accepted by Leopard.
Returns
number
: Audio sample rate accepted by Leopard.
Leopard.version
Getter for version.
Returns
string
: CurrentLeopard
version.
Leopard.process()
Processes given audio data with the speech-to-text engine. The incoming audio needs to have a sample rate equal
to .sampleRate
and be 16-bit linearly-encoded. Leopard operates on single-channel audio.
Parameters
pcm
Array<number> : Audio data.
Returns
LeopardTranscript
: Inferred transcription.
Leopard.processFile()
Processes an audio file with the speech-to-text engine.
Parameters
audioPath
string : Absolute path to the audio file. The supported formats are:FLAC
,MP3
,Ogg
,WAV
,WebM
,MP4/m4a (AAC)
, and3gp (AMR)
Returns
LeopardTranscript
: Inferred transcription.
LeopardWord
Object which contains a transcribed word and their associated metadata.
word
string : Transcribed word.startSec
number : Start of word in seconds.endSec
number : End of word in seconds.confidence
number : Transcription confidence. It is a number within [0, 1].speakerTag
number : The speaker tag is-1
if diarization is not enabled during initialization; otherwise, it's a non-negative integer identifying unique speakers, with0
reserved for unknown speakers.
LeopardTranscript
Object which contains the transcription results of the engine:
transcript
string : Inferred transcription.words
LeopardWord[] : transcribed words and its associated metadata.
Errors
Exceptions thrown if an error occurs within Leopard
Speech-to-Text engine.
Exceptions: