Leopard Speech-to-Text
Python API
API Reference for the Python Leopard SDK (PyPI).
pvleopard.create()
Factory method for Leopard Speech-to-Text engine.
Parameters
access_key
str : AccessKey obtained from Picovoice Console.model_path
Optional[str] : Absolute path to the file containing model parameters.library_path
Optional[str] : Absolute path to Leopard's dynamic library.enable_automatic_punctuation
bool : Set toTrue
to enable automatic punctuation insertion.enable_diarization
bool : Set totrue
to enable speaker diarization, which allows Leopard to differentiate speakers as part of the transcription process. Word metadata will include aspeakerTag
to identify unique speakers.
Returns
Leopard
: An instance of Leopard Speech-to-Text engine.
Throws
pvleopard.Leopard
Class for the Leopard Speech-to-Text engine.
Leopard can be initialized either using the module level create()
function
or directly using the class __init__()
method.
Resources should be cleaned when you are done using the delete()
method.
pvleopard.Leopard.version
The version string of the Leopard library.
pvleopard.Leopard.sample_rate
The audio sample rate the Leopard accepts.
pvleopard.Leopard.__init__()
__init__
method for Leopard Speech-to-Text engine.
Parameters
access_key
str : AccessKey obtained from Picovoice Console.model_path
str : Absolute path to the file containing model parameters.library_path
str : Absolute path to Leopard's dynamic library.enable_automatic_punctuation
bool : Set toTrue
to enable automatic punctuation insertion.enable_diarization
bool : Set totrue
to enable speaker diarization, which allows Leopard to differentiate speakers as part of the transcription process. Word metadata will include aspeakerTag
to identify unique speakers.
Returns
Leopard
: An instance of Leopard Speech-to-Text engine.
Throws
pvleopard.Leopard.delete()
Releases resources acquired by Leopard.
pvleopard.Leopard.Word
Metadata associated with a transcribed word.
word
str : Transcribed word.start_sec
float : Start of word in secondsend_sec
float : End of word in secondsconfidence
float : Transcription confidence.speaker_tag
int : Speaker tag is-1
if diarization is not enabled during initialization; otherwise, it's a non-negative integer identifying unique speakers, with0
reserved for unknown speakers.
pvleopard.Leopard.process()
Processes a given audio data and returns its transcription. The audio needs to have a sample rate equal to .sample_rate
and be 16-bit linearly-encoded. This function operates on single-channel audio.
If you wish to process data in a different sample rate or format consider using .process_file()
.
Parameters
pcm
Sequence[int] : Audio data.
Returns
Tuple[str, Sequence[Word]]
: Inferred transcription and sequence of transcribed words and their associated metadata.
Throws
pvleopard.Leopard.process_file()
Processes a given audio file and returns its transcription. The supported formats are: 3gp (AMR)
, FLAC
, MP3
, MP4/m4a (AAC)
, Ogg
, WAV
, and WebM
.
Parameters
audio_path
str : Absolute path to the audio file.
Returns
Tuple[str, Sequence[Word]]
: Inferred transcription and sequence of transcribed words and their associated metadata.
Throws
pvleopard.LeopardError
Error thrown if an error occurs within Leopard
Speech-to-Text engine.
Exceptions