Leopard Speech-to-Text
Python API
API Reference for the Python Leopard SDK (PyPI).
pvleopard.create()
Factory method for Leopard Speech-to-Text engine.
Parameters
access_keystr : AccessKey obtained from Picovoice Console.model_pathOptional[str] : Absolute path to the file containing model parameters.library_pathOptional[str] : Absolute path to Leopard's dynamic library.enable_automatic_punctuationbool : Set toTrueto enable automatic punctuation insertion.enable_diarizationbool : Set totrueto enable speaker diarization, which allows Leopard to differentiate speakers as part of the transcription process. Word metadata will include aspeakerTagto identify unique speakers.
Returns
Leopard: An instance of Leopard Speech-to-Text engine.
Throws
pvleopard.Leopard
Class for the Leopard Speech-to-Text engine.
Leopard can be initialized either using the module level create() function
or directly using the class __init__() method.
Resources should be cleaned when you are done using the delete() method.
pvleopard.Leopard.version
The version string of the Leopard library.
pvleopard.Leopard.sample_rate
The audio sample rate the Leopard accepts.
pvleopard.Leopard.__init__()
__init__ method for Leopard Speech-to-Text engine.
Parameters
access_keystr : AccessKey obtained from Picovoice Console.model_pathstr : Absolute path to the file containing model parameters.library_pathstr : Absolute path to Leopard's dynamic library.enable_automatic_punctuationbool : Set toTrueto enable automatic punctuation insertion.enable_diarizationbool : Set totrueto enable speaker diarization, which allows Leopard to differentiate speakers as part of the transcription process. Word metadata will include aspeakerTagto identify unique speakers.
Returns
Leopard: An instance of Leopard Speech-to-Text engine.
Throws
pvleopard.Leopard.delete()
Releases resources acquired by Leopard.
pvleopard.Leopard.Word
Metadata associated with a transcribed word.
wordstr : Transcribed word.start_secfloat : Start of word in secondsend_secfloat : End of word in secondsconfidencefloat : Transcription confidence.speaker_tagint : Speaker tag is-1if diarization is not enabled during initialization; otherwise, it's a non-negative integer identifying unique speakers, with0reserved for unknown speakers.
pvleopard.Leopard.process()
Processes a given audio data and returns its transcription. The audio needs to have a sample rate equal to .sample_rate and be 16-bit linearly-encoded. This function operates on single-channel audio.
If you wish to process data in a different sample rate or format consider using .process_file().
Parameters
pcmSequence[int] : Audio data.
Returns
Tuple[str, Sequence[Word]]: Inferred transcription and sequence of transcribed words and their associated metadata.
Throws
pvleopard.Leopard.process_file()
Processes a given audio file and returns its transcription. The supported formats are: 3gp (AMR), FLAC, MP3, MP4/m4a (AAC), Ogg, WAV, and WebM.
Parameters
audio_pathstr : Absolute path to the audio file.
Returns
Tuple[str, Sequence[Word]]: Inferred transcription and sequence of transcribed words and their associated metadata.
Throws
pvleopard.LeopardError
Error thrown if an error occurs within Leopard Speech-to-Text engine.
Exceptions