Leopard Speech-to-Text
Node.js API
API Reference for the Node.js Leopard SDK (npm)
Leopard
Class for the Leopard Speech-to-Text engine.
Leopard can be initialized using the class constructor().
Resources should be cleaned when you are done using the release() method.
Leopard.constructor()
Leopard constructor.
Parameters
accessKeystring : AccessKey obtained from Picovoice Console.optionsLeopardOptions: Optional configuration arguments:modelPathstring : Path to the file containing model parameters (.pv).libraryPathstring : Path to the Leopard dynamic library (.node).enableAutomaticPunctuationboolean : Whether to enable automatic punctuation. Default is false.enableDiarizationboolean : Whether to enable diarization. Set totrueto enable speaker diarization, which allows Leopard to differentiate speakers as part of the transcription process. Word metadata will include aspeaker_tagto identify unique speakers.
Returns
Leopard: An instance of Leopard platform.
Leopard.release()
Releases resources acquired by Leopard.
Leopard.sampleRate
Getter for audio sample rate accepted by Leopard.
Returns
number: Audio sample rate accepted by Leopard.
Leopard.version
Getter for version.
Returns
string: CurrentLeopardversion.
Leopard.process()
Processes given audio data with the speech-to-text engine. The incoming audio needs to have a sample rate equal
to .sampleRate and be 16-bit linearly-encoded. Leopard operates on single-channel audio.
Parameters
pcmArray<number> : Audio data.
Returns
LeopardTranscript: Inferred transcription.
Leopard.processFile()
Processes an audio file with the speech-to-text engine.
Parameters
audioPathstring : Absolute path to the audio file. The supported formats are:FLAC,MP3,Ogg,WAV,WebM,MP4/m4a (AAC), and3gp (AMR)
Returns
LeopardTranscript: Inferred transcription.
LeopardWord
Object which contains a transcribed word and their associated metadata.
wordstring : Transcribed word.startSecnumber : Start of word in seconds.endSecnumber : End of word in seconds.confidencenumber : Transcription confidence. It is a number within [0, 1].speakerTagnumber : The speaker tag is-1if diarization is not enabled during initialization; otherwise, it's a non-negative integer identifying unique speakers, with0reserved for unknown speakers.
LeopardTranscript
Object which contains the transcription results of the engine:
transcriptstring : Inferred transcription.wordsLeopardWord[] : transcribed words and its associated metadata.
Errors
Exceptions thrown if an error occurs within Leopard Speech-to-Text engine.
Exceptions: