Leopard Speech-to-Text
Web API
API Reference for the Leopard Web SDK (leopard-web)
Leopard
Class for the Leopard Speech-to-Text engine.
Leopard.create()
Creates an instance of Leopard Speech-to-Text engine using '.pv' file in public directory. The model size is large, hence it will try to use the existing one if it exists, otherwise saves the model in storage.
Parameters
accessKey
string : AccessKey obtained from Picovoice Console.model
LeopardModel : Leopard model options.options
LeopardOptions : Optional configuration arguments.
Returns
Leopard
: An instance of the Leopard engine.
Leopard.process()
Processes audio. The required sample rate can be retrieved from '.sampleRate'. The audio needs to be 16-bit linearly-encoded. Furthermore, the engine operates on single-channel audio.
Parameters
pcm
Int16Array : Audio data.
Returns
LeopardTranscript
: The inferred transcript with metadata.
Leopard.release()
Releases resources acquired by the Leopard Web SDK.
Leopard.sampleRate
Audio sample rate accepted by Leopard.
Leopard.version
Leopard version string.
LeopardModel
Leopard model type.
base64
string: The model file (.pv
) in base64 string to initialize Leopard.publicPath
string: The model file (.pv
) path relative to the public directory.customWritePath
string : Custom path to save the model in storage. Set to a different name to use multiple models acrossleopard
instances.forceWrite
boolean : Flag to overwrite the model in storage even if it exists.version
number : Version of the model file. Increment to update the model file in storage.
LeopardOptions
Leopard options type.
enableAutomaticPunctuation
boolean : Flag to enable automatic punctuation insertion.enableDiarization
boolean : Flag to enable automatic punctuation insertion. Set totrue
to enable speaker diarization, which allows Leopard to differentiate speakers as part of the transcription process. Word metadata will include aspeakerTag
to identify unique speakers.
LeopardTranscript
Leopard transcript type.
transcript
string : Inferred transcript of process.words
LeopardWord[] : Metadata of the transcript.
LeopardWord
Leopard metadata type.
word
string : A word in the transcript.startSec
number : Position in seconds where the word starts.endSec
number : Position in seconds where the word ends.confidence
number : Number between 0 and 1, indication the confidence level of the word.speakerTag
number : Speaker tag is-1
if diarization is not enabled during initialization; otherwise, it's a non-negative integer identifying unique speakers, with0
reserved for unknown speakers.
LeopardWorker
A class for creating new instances of the LeopardWorker
.
LeopardWorker.create()
Creates an instance of LeopardWorker
using '.pv' file in public directory. The model size is large, hence it will try to use the existing one if it exists, otherwise saves the model in storage.
Parameters
accessKey
string : AccessKey obtained from Picovoice Console.model
LeopardModel : Leopard model options.options
LeopardOptions : Optional configuration arguments.
Returns
LeopardWorker
: An instance ofLeopardWorker
.
LeopardWorker.process()
Processes audio. The required sample rate can be retrieved from '.sampleRate'. The audio needs to be 16-bit linearly-encoded. Furthermore, the engine operates on single-channel audio.
Parameters
pcm
Int16Array : Audio data.options
Object : Optional process arguments.options.transfer
boolean : Optional flag to indicate if the buffer should be transferred or not. If set to true, input buffer array will be transferred to the worker.options.transferCallback
(pcm: Int16Array) => void : Optional callback containing a new Int16Array with contents frompcm
. Use this callback to get the input pcm when using transfer.
Returns
LeopardTranscript
: The inferred transcript with metadata.
LeopardWorker.release()
Releases resources acquired by the Leopard Web SDK.
LeopardWorker.terminate()
Force terminates the instance of LeopardWorker
.
LeopardWorker.sampleRate
Audio sample rate accepted by Leopard.
LeopardWorker.version
Leopard version string.